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Abstract 

We consider the problem of stably computing the Walsh-Hadamard Transform (WHT) of some 7V-length 
input vector in the presence of noise, where the A^-point Walsh spectrum is iT-sparse with K = 0{N^) scaling 
sub-linearly in the input dimension N for some 0 < <5 < 1. Note that K is linear in N (i.e. S = 1), then similar 
to the standard Fast Fourier Transform (FFT) algorithm, the classic Fast WHT (FWHT) algorithm offers an 
0{N) sample cost and 0{N log N) computational cost, which are order optimal. Over the past decade, there 
has been a resurgence in research related to the computation of Discrete Fourier Transform (DFT) for some 
length-input signal that has a AT-sparse A^-point Fourier spectrum. In particular, through a sparse-graph code 
design, our earlier work on the Fast Fourier Aliasing-based Sparse Transform (FFAST) algorithm |]T] computes 
the AT-sparse DFT in time 0{K\ogK) by taking 0{K) noiseless samples. Inspired by the coding-theoretic 
design framework in ||T], Scheibler et al. in Q proposed the Sparse Fast Hadamard Transform (SparseFHT) 
algorithm that elegantly computes the AT-sparse WHT in the absence of noise using 0{K log N) samples in 
time 0{K log^ N). However, the SparseFHT algorithm explicitly exploits the noiseless nature of the problem, 
and is not equipped to deal with scenarios where the observations are cotTupted by noise, as is true in general. 
Therefore, a question of critical interest is whether this coding-theoretic framework can be made robust to noise. 
Further, if the answer is yes, what is the extra price that needs to be paid for being robust to noise? 

In this paper, we show, quite interestingly, that there is no extra price that needs to be paid for being 
robust to noise other than a constant factor. In other words, we can maintain the same scaling for the sample 
complexity 0{K log N) and the computational complexity 0{K log^ N) as those of the noiseless case, using 
our proposed SParse Robust Iterative Graph-based Hadamard Transform (SPRIGHT) algorithm. Similar to 
the FFAST algorithm Q and the SparseFHT algorithm Q, the proposed SPRIGHT framework succeeds with 
high probability with respect to a random ensemble of signals with sparse Walsh spectra, where the support 
of the non-zero WHT coefficients is uniformly random. Experiments further cotToborate the robustness of the 
SPRIGHT framework as well as its scaling performance. 


1 Introduction 

Ever since the introduction of orthonormal Walsh functions, the Walsh-Hadamard Transform (WHT) has gained 
traction for signal analysis in place of the Discrete Fourier Transform (DFT) because of its simplicity in computa¬ 
tions and applicability in the design of practical systems like digital circuits. Starting off as the “poor man’s fast 
Fourier Transform”, the WHT has been further deployed over the past few decades in image and video compres¬ 
sion Q, spreading code design in multiuser systems such as CDMA and GPS 0, and compressive sensing Q. 
More recently, sparsity in the Walsh spectrum is found in many real-world applications involving the processing of 
large datasets, such as learning (pseudo) Boolean functions, decision trees and disjunctive normative form (DNF) 
formulas, etc. Therefore, it is of practical and theoretical interest to develop fast algorithms for computing the 
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WHT of signals with sparse or approximately sparse Walsh spectra. Traditionally, the WHT can be computed us¬ 
ing N samples and 0{N log N) operations via a recursive algorithm |6]|7 1 analogous to the Fast Fourier Transform 
(FFT). However, these costs can be significantly reduced if the signal has a sparse Walsh spectrum 


1.1 Motivation and Contributions 

There has been a recent resurgence in research on computing the Discrete Fourier Transform (DFT) of signals that 
have sparse Fourier spectra p^[T0 - 14|. Since the WHT is a special case of a multidimensional DFT over the binary 
field, recenf advances in compufing iF-sparse Af-poinf DFTs have provided insighfs in designing algorifhms for 
compufing sparse WHTs. In particular, major progress has been made in breaking fhe ‘W-barrier” for computing 
an A^-poinf sparse DFTs, which means fhaf fhe sample complexify and compufafional complexify do nof depend 
on fhe signal dimension N. In parficular, using a sparse-graph code design, fhe Fast Fourier Aliasing-based Sparse 
Transform (FFAST) algorifhm Q uses 0{K) samples and 0{K log K) operations for any sub-linear sparsify 
K = 0{N^) wifhO < 5 < 1 assuming a uniform supporf disfribufion. Under a similar uniform support disfribufion 
for fhe WHT coefficienls, fhe Sparse Fasf Hadamard Transform (SparseFHT) algorifhm developed in Q eleganfly 
compufes a iT-sparse A^-poinf WHT wifh K = 0{N^) using 0{K \og{N/K)) samples and 0{K log K log N/K) 
operafions by following fhe sparse-graph code design in Q for DFTs. When K is scales sub-linearly in N as 
K = 0{N^) for some consfanf 0 < 5 < 1, fhese resulfs are hereby interpreted as achieving a sample complexify 
0{K log N) and a compufafional complexify 0{K log^ N). A limifafion of fhe SparseFHT algorifhm is fhaf if is 
designed fo explicifly exploif fhe noiseless nafure of fhe underlying signals and if is nof clear how fo generalize 
if fo noisy seffings. A key quesfion of fheorefical and pracfical inferesf in fhis paper is: whaf price musf be paid 
fo be robusf fo noise? Inferesfingly, in fhis paper we show fhaf there are no extra costs in sample complexity and 
computational complexity for being robust to noise, other than a constant factor determined by the signaTto-noise 
ratio (SNR). 

Inspired by fhe algorifhm design from fhe FFAST algorifhm in Q and fhe noisy FFAST analysis in 1151, we 
consider fhe problem of compufing a iT-sparse A^-poinf WHT from fhe inpuf vector in the presence of noise, when 
fhe sparsify K = 0{N^) is sub-linear in fhe signal dimension N for some 0 < <5 < 1 assuming a uniform sup- 
porf disfribufion. We develop a SParse Robust Iterative Graph-based Transform (SPRIGHT) framework fo sfably 
compufe fhe iT-sparse A^-lengfh WHT af any consfanf SNRs wifh high probabilify. In parficular, our framework 
achieves sub-linear run-time 0(iTlog^A^) using 0(iT log A^) noisy samples, which mainfains fhe same sample 
and compufafional scaling as fhe noiseless case. This resulf also confrasfs wifh fhe work on compufing fhe sparse 
DFT in fhe presence of noise | [T5| , where fhe robusfness fo noise incurs an exfra facfor of 0(log A^) in ferms of 
fhe sample complexify from 0{K) fo 0(iF log A^) (fhe same exfra facfor is manifested in fhe run-fime as well). 
This can be infuifively explained by fhe facf fhaf fhe complex-valued A^-poinf Fourier Iransform kernel has a “1/A^ 
precision” while fhe binary-valued WHT kernel has a “bif precision”. 


1.2 Notation and Organization 

Throughouf fhis paper, fhe sef of integers {0,1, • • • , A^ — 1} for some integer N is denoted by [A^]. Lowercase 
tellers, such as x, are used for fhe lime domain expressions and uppercase tellers, such as X, are used for fhe 
Iransform domain signal. Any boldface lowercase letter such as x G represenls a column vecfor confaining fhe 
corresponding N samples. The operafor supp(x) lakes fhe supporf sef of fhe vecfor x and | • | fakes fhe cardinalily 
of a cerfain sef. The nolafion F 2 refers fo fhe finife field consisfing of {0,1}, wifh defined operations such as 
summation and mulfiplicafion modulo 2. Furthermore, we lei Fg be fhe n-dimensional column vecfor wifh each 
elemenf faking values from F 2 . For any vector i G F 2 , denote by i = • • • , f[n]]^ G F 2 the index vector 

containing the binary representation of some integer i, with f[l] and i[n] being the least significant bit (LSB) and 
the most significant bit (MSB), respectively. The inner product of two binary indices i G F 2 and j G F 2 is defined 
by (i, j) = * Wj [^] with arithmetic over F 2 , and the inner product between two vectors x, y G is defined 
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as (x, y) = Xltli a:[t]n[t] with arithmetic over M. The sign function here is defined as 


sgn [x] 


1, a: < 0 

0, a: > 0 


( 1 ) 


such that a: = |x|(— 

This paper is organized as follows. In Section we present our input (signal) model and our goal, followed 
by a summary of our main results. To motivate our design, we explain in Sectionj^the main idea of our SPRIGHT 
framework through a simple example. Then, we generalize the simple example and present the framework in 
Section]^ followed by detailed discussions in Sectionabout the noisy scenarios in our framework. Last but not 
least, in Section we briefly mention some machine learning applications that can be potentially cast as a sparse 
WHT computation problem, followed by numerical experiments in Section]^ 

2 Problem Setup and Main Results 

Given a signal x G containing N = 2"^ samples x[m] indexed by m G F 2 (i.e. the n-bit binary representation 
of m G [N]), its WHT coefficient is computed as 

^ E (2) 

mSF^ 

where k = [A;[l], • • • , A:[n]]^ G F 2 denotes the n-tuple index in the transform domain. Likewise, each sample 
a:[m] has a WHT expansion as 


x[m] 


1 

y/N 


keFJ 


( 3 ) 


2.1 Problem Setup 

In this work, we consider the noisy scenario where the samples x[m] are corrupted by additive noise m[m] ~ 
M{0, cr^), which is independent and normally distributed for all m G F 2 . Thus, we have access to only the 
noise-corrupted samples: 

n[m] = ^ E (-1)^"’*^^^^] + w[in], m G F^. (4) 

keF^ 


Assumption 1. Let X G be the WHT coefficient vector with support JC := supp (X). Throughout this paper, 
we make the following assumptions: 

At Each element in the support set /C is chosen independently and uniformly at random from [A^]. 

A2 The sparsity K = |supp (X) | = 0{N^) is sub-linear in the dimension N for some 0 < 5 < 1. 

A3 Each coefficient X\k]for k G /C w chosen from a finite set X := {ip} uniformly at random. 

A4 The signal-to-noise ratio (SNR) is defined as 


SNR = 


|x|p/A^ 


a 


a'^NjK 

and is assumed to be an arbitrary constant value (i.e., p scales with y^N/K). 


( 5 ) 
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Remark 1. While the uniform distribution assumption A1 on the support 1C is essential to the analysis of our 
algorithm (see also and 1^), it can be generalized to accommodate non-uniform distributions that are of 
practical interest in real world applications. If we fail to insist on the sub-linear sparsity regime imposed in A2, 
our results reduce to 0(N) samples in time 0{N log N), which is well understood in classic WHT computations. 
Further, the binary constellation assumption A3 is imposed to simplify our analysis and can be readily extended to 
any arbitrarily large but finite constellation, which subsumes all practical digital signals that have been quantized 
with finite precision (essentially any signal processed by a digital computer). Last but not least, the constant SNR 
assumption A4 covers all regimes of interest. 

The goal of this paper is to develop a robust and efficient algorithm that reliably recovers exactly the entire 
support /C of the sparse WHT of a signal as well as the associated non-zero coefficients X[k] for k G /C in the 
presence of noise. The questions of interest are 

1. How many noisy samples are needed to reliably recover the support of the sparse WHT? 

2. Can we reduce the computational complexity of the sparse WHT over that of the conventional WHT algo¬ 
rithm, even in the presence of noise? 

In the following, we first provide a summary of our main technical results, followed by a brief mention of 
previous work on computing sparse transforms. 


2.2 Main Result 


Our design is characterized by the triplet (M, T,Ff), where M is the sample complexity T is the computational 
complexity in terms of arithmetic operations, and Pi? is the probability of failure in recovering the exact support of 
the sparse WHT, given by 


Pi? := E 


l{supp (X) / supp(X)} 


( 6 ) 


where !{•} is the indicator function and supp (•) represents the support of some vector and the expectation is 
obtained with respect to the randomization of our algorithm, the noise distribution as well as the random signal 
ensemble in Assumption [T] 

Theorem 1. Let Assumption^holdfor the signal of interest x and its WHT vector X. Then for any sparsity regime 
K = O{N^)with0 < 6 < 1, the SFKIGWT framework computes the K-sparse N-point WHT X with a vanishing 
failure probability Pi? —)■ 0 asymptotically in K and N using the following two algorithm options: 

• the Sample Optimal (SO) SPRIGHT algorithm with a sample complexity of M = 0(K log N) and a 
computational complexity ofT = 0{K log^ N); 

• the Near Sample Optimal (NSO) SPRIGHT algorithm with a sample complexity of M = 0{K\o^ N) 
and a computational complexity ofT = 0{K\o^ N). 

Proof. See Appendix [A| □ 


Remark 2. Since we assume an arbitrarily large but finite constellation X for each non-zero coefficient, we show 
that the coefficients can in fact be recovered perfectly, even from the noisy measurements with high probability. 
The recovery algorithm is equally applicable to support recovery for signals with arbitrary coefficients over the 
real field, but the analysis becomes overly cumbersome without offering more insights to our design. Hence we do 
not pursue it in this paper. 

Remark 3. Note that although the result in Theorem is obtained with a randomized algorithm, our SPRIGHT 
framework also admits the option of using a deterministic algorithm by spending an extra factor of OifogN) in 
both sample complexity and computational complexity. 

'Note that the sample complexity is the number of raw samples needed as input for computations, as opposed to the measurement 
complexity in compressed sensing, where each measurement may potentially require all the samples from the input vector. 
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2.3 Related Work 


Due to the similarities between the DFT and the WHT, we give a brief account of previous work on reducing the 
sample and computational complexity of obtaining a iiT-sparse A^-point DFT. The most related research thread in 
the literature is the computation of sparse DFT using theoretical computer science techniques such as sketching 
and hashing (see 16 -T^). Most of these algorithms aim at minimizing the approximation error of the DFT 
coefficients using an ^ 2 -iiorm metric instead of exact support recovery (i.e., £o-norm). 

Among these works, the most recent progress in this direction is the sFFT (Sparse FFT) algorithm developed 
in the series of papers 110 1^. Most of these algorithms are based on first isolating (i.e., hashing) the non-zero 
DFT coefficients into different bins, using specific filters or windows that have ‘good’ (concentrated) support in 
both time and frequency. The non-zero DFT coefficients are then recovered iteratively, one at a time. The filters or 
windows used for the binning operation are typically of length 0{K log N). As a result, the sample complexity is 
typically 0{K log N) or more, with potentially large big-Oh constants as demonstrated in 1131. Then, 1121 further 
improved the 2-D DFT algorithm for the special case of K = y/N, which reduces the sample complexity to 0{K) 
and the computational complexity to 0{K log K), albeit with a constant failure probability that does not vanish 
as the signal dimension N grows. On this front, the deterministic algorithm in p4] | is shown to guarantee zero 
errors but with complexities of O(poly(AT, log A’)). More recently, |20| develops a deterministic algorithm for 
computing a sparse WHT in time 0(A'^+^ log^*-^^ N) with an arbitrary constant e > 0. 

One of the interesting recent advances in computing sparse DFTs is in the breaking of the “A^-barrier”, which 
means that the complexities no longer depend on the input dimension N. In particular, the FFAST algorithm l[T| 
uses only 0{K) samples and 0{K log K) operations for any sparsity regime K = 0{N^) and 5 G (0,1). Similar 
to the spirit of compressed sensing in linearly combining sparse components (i.e., DFT coefficients), the FFAST 
algorithm judiciously chooses subsampling patterns to create spectral aliasing patterns to make them look like 
“good” (i.e., near-capacity achieving) erasure-correcting codes |[^[22|. The key insight is that we can effectively 


transform the sparse DFT computation problem into that of sparse-graph decoding to reconstruct the original 
“message” (i.e., sparse spectrum), which allows to use a simple peeling-based decoder with very low complexity. 
The success of the FFAST algorithm depends on the single-ton test to pinpoint frequency bins containing only one 
“erasure event” (unknown non-zero DFT coefficient). Given such a single-ton bin, the value and location of the 
coefficient can be obtained and then removed from other bins. This procedure iterates until no more single-ton 
bins are found. In the same spirit of Q, the SparseFHT algorithm in Q elegantly computes a AT-sparse WHT of 
X using 0{K log N) samples and 0{K log^ N) operations. 


3 Main Idea: A Simple Example 

Since the sparsity is much smaller than the input dimension K N, It is desirable if we can compute the WHT 
using very few samples M N without reading the entire signal. The most straightforward way to reduce 
the number of samples to process is to subsample. However, from a reconstruction perspective, it is generally 
disastrous to subsample since it creates aliasing in the spectral domain that mixes the WHT coefficients X [k]. 

The key idea of our SPRIGHT framework is to embrace (rather than avoid) the aliasing pattern as a form of 
“alias code”, which is induced by the subsampling patterns guided by coding-theoretic designs, and more specif¬ 
ically, sparse-graph codes such as Low Density Parity Check (LDPC) codes. Then, our SPRIGHT framework 
exploits the aliasing pattern (alias code) to reconstruct the sparse Walsh spectrum in the presence of noise, by 
uncovering the sparse coefficients one-by-one iteratively in the spirit of decoding over noisy channels. While the 
design philosophy is similar to the FFAST algorithm in 0 and the SparseFHT algorithm in 0, our framework 
non-trivially generalizes this to the noisy scenario by robustifying the “alias code” for noisy decoding. Interest¬ 
ingly, we show that our framework can maintain the same scaling in both sample complexity and computational 
complexity as that in the noiseless case 0. For completeness, we will repeat the noiseless design in the sequel, 
but using our setup and terminology. 
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3.1 Subsampling and Aliasing 

Our observation model is based on using multiple basic observation sets formed by randomized subsampling and 
tiny-sized WHTs, where each set contains B = 2^ (for some 6 > 0) samples obtained as: 

• Subsampling: consider some integer b < n, the subsampling of noisy signal rt[m] in Q is performed by 
isolating a subset of i? = 2^ samples indexed by m = M£ + d for £ G F^, where M G F^is some binary 
matrix and d G F 2 is some random binary vector. In other words, after generating M G F^^^ and d G F 2 , 
the subset of samples are selected by running the 6-tuple £ over F^. 

• B-point WHT: a much smaller i?-point WHT is performed over the samples u\M.£ -\- d] for £ G F^. The 
subsampled signal has an aliased WHT spectrum readily obtained by a H-point WHT 

U[j] = Y, u[m£ + d](-l)<^’^\ 3 G Ft (7) 


Example 1. We consider an example with n = 4 and sparsity K = B = 2^ = 4 (i.e. b = 2). For simplicity, we 
construct 2 sets of observations using 

Ml = [0tx2)l2x2]'^) M 2 = [ltx2) 0 ^x 2 ]"^- (S) 


We call each set of observations using a different subsampling pattern a subsampling group. With these patterns, 
we access the following samples in each group for £ = [("i, ^ 2 ]^ £ ^2 


u[Mi£] = ?r[0 0 £1 £ 2 ] 


'm[0000] 

m[0001] 

m[0010] 

m[0011] 


u[M 2 £] = u[£i £2 0 0] 


'^[ 0000 ] 

«[ 0100 ] 

rt[1000] 

ri[1100] 


After performing a A-point WHT on each set of these samples, we have 2 sets of noisy observations: 


17i[00] = A [0000] -b A [0100] -b 
f/i[01] = A [0001] -b A [0101] -b 
17i[10] = A [0010] -b A [0110] -b 
Ui[ll] = A[0011] + A[0111] + 
[/2[00] = A [0000] -b A [0001] -b 
172 [01] = A [0100] -b A [0101] -b 
f/2[10] = A [1000] -b A [1001] -b 
U2[ll] = A[1100] + A[1101] + 


A [1000] + A [1100] 

+ 

Wi[00] 

A[1001] -b A[1101] 

+ 

Wi[01] 

A[1010] -b A[1110] 

+ 

Wi[10] 

A[1011] -b A[llll] 

+ 

lCi[ll] 

A[0010] -b A[0011] 

+ 

W2[00] 

A[0110] -b A[0111] 

+ 

W2[01] 

A[1010] -b A[1011] 

+ 

W2[10] 

A[1110] -b A[llll] 

+ 

W2[ll] 


3.2 Computing Sparse WHT as Sparse-Graph Decoding 

In the presence of noise, the coefficients A[k] should be intuitively obtained as the “least-squares” solution over the 
2 sets of B observations in Example [T] However, the linear regression problem is underdetermined as we are given 
8 equations with 16 unknowns. Fortunately, the coefficients are sparse, and this helps significantly. For simplicity, 
suppose that the 4 non-zero coefficients are A[0100] = 2, A[0110] = 4, A[1010] = 1 and A[llll] = 1. Now we 
have 8 equations with 4 unknowns (non-zero), but we do not know which unknowns are non-zero. Then, we have 

Ui [00] = A [0100] + Wi [00], C /2 [00] = W 2 [00] 

C/i[01] = Wi[01], C/2[01] = A[0100] + A[0110] + W2[01] 

C/i[10] = A[0110] + A[1010] + Wi[10], C/2[10] = A[1010] + VF2[10] 

C/i[ll] = A[llll] + Wi[ll], U2[ll] = A[llll] + W2[ll]. 
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Now this problem seems quite a bit less daunting since the number of equations is more than the number of 
unknowns. The challenging part, however, is that we do not know in advance which coefficients X [k] exist in the 
equation since the sparse coefficients are randomly chosen over k G F 2 . Here, we illustrate the principle of our 
recovery algorithm through the same simple example by showing that the recovery is an instance of sparse-graph 
decoding with the help of an “oracle” (described later). Then in the next subsection, we will introduce how to get 
rid of the oracle. 


3.2.1 Oracle-based Sparse-Graph Decoding 


den 


The relationship between the observations { Ui [j ]} 2 unknown coefficients X [k] can be shown as a bipar- 

tite graph in Fig. 1 where the left nodes (unknown coefficients X[k]) and right nodes (observations 
are referred to as me variable nodes and check nodes respectively in the language of sparse-graph codes. Depend¬ 
ing on the connectivity of the sparse bipartite graph, we categorize the observations into the following types: 


1. Zero-ton: a check node is a zero-ton if it has no non-zero coefficients (e.g., the color blue in Fig.[T]). 

2. Single-ton: a check node is a single-ton if it involves only one non-zero coefficient (e.g., the color yellow in 
Fig.[^. Specifically, we refer fo fhe index k and ifs associated value [k] as fhe index-value pair (k, X [k]). 

3. Multi-ton: a check node is a mulfi-fon if if confains more fhan one non-zero coefficienl (e.g., fhe color red in 

Fig-B- 


To illustrate our reconstruction algorithm, we assume that 
there exists an “oracle” that informs the decoder exactly which 
check nodes are single-tons. Furthermore, the oracle further 
provides the index-value pair for that single-ton. In this ex¬ 
ample, the oracle informs the decoder that check nodes labeled 
f7i[00], f7i[ll], (/ 2 [ 10 ] and (72 [11] are single-tons with index- 
value pairs (0100,X[0100]), (1111,X[llll]), (1010,X[1010]) 
and (1111,X[1111]) respectively. Then the decoder can subtract 
their contributions from other check nodes, forming new single- 
tons. Therefore generally speaking, with the oracle information, 
the peeling decoder repeats the following steps: 

Step (1) select all the edges in the bipartite graph with right degree 
1 (identify single-ton bins); 

Step (2) remove (peel off) these edges as well as the correspond¬ 
ing pair of variable and check nodes connected to these 
edges. 

Step (3) remove (peel off) all other edges connected to the vari¬ 
able nodes that have been removed in Step (2). 

Step (4) subtract the contributions of the variable nodes from the 
check nodes whose edges have been removed in Step (3). 

Finally, decoding is successful if all the edges are removed from 
the graph together with all the unknown coefficients 2f[k] such 
that all the WHT coefficients are decoded. 



Figure 1: Example of a sparse bipartite graph con¬ 
sisting of 4 (non-zero) left nodes (variable nodes) 
connected to the 2 subsampling groups as a re¬ 
sult of the sub-sampling-based randomized hash¬ 
ing in each group. Blue color represents “zero- 
ton”, yellow color represents “single-ton” and red 
color represents “multi-ton”. 
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3.2.2 Getting Rid of the Oracle : Bin Detection 


Since the oracle information is critical in the peeling process, we proceed with our example and explain briefly how 
to obtain such information without an oracle. We call this procedure “bin detection”. For simplicity, we illustrate 
the design where the samples are noise-free. To obtain the oracle information, we exploit the diversity of using 
different offsets. For instance, in group 1, we use the subsampling matrix Mi and the following set of offsets 

di,o = [0,0,0,0]^, di,i = [1,0,0,0]^, di,2 = [0,1,0,0]'^, di,3 = [0,0,1,0]^, di,4 = [0,0,0,!]^. 


In this way, using the subsampling pattern Mi and the offsets above, each check node is now assigned a 5- 
dimensional vector Ui[j] = [C/i,o[j], where Ui^p[j] is associated with the p-th 

offset di p for p = 0,1, • • • ,4. We call each vector of observations Uc[j] in one group the bin observation vector 
j. For example, the bin observation vectors for group 1 are obtained as Ui [00] = 0 and 


Ui[01] =X[0100] 


1 


1 


1 


1 

(-1)0 


(-1)0 


(-1)' 


(-1)' 

(-1)' 

, Ui[10] = X[0110] 

(-1)' 

-hX[1010] 

(-1)0 

, Ui[ll] = X[1111] 

(-1)' 

(-1)0 


(-1)' 


(-1)' 


(-1)' 









Now with these bin observations, one can effectively determine if a check node is a zero-ton, a single-ton or a 
multi-ton. We go through some examples: 

• zero-ton bin: consider the zero-ton check node Ui [00]. A zero-ton check node can be identified easily since 
the measurements are all zero Ui[00] = 0. 


• multi-ton bin: consider the multi-ton check node Ui[10]. A multi-ton can be easily identified since the 
magnitudes are not identical IC/i^o[10]I / |[/i,i[10]| / |C/i, 2 [ 10 ]| / |[/i, 3 [ 10 ]| ^ |C/i^ 4 [ 10 ]| or namely, the 
following ratio condition is not met: 


fli,p[10] 

^7i,o[10] 


p = l,2,3,4. 


( 9 ) 


Therefore, if the ratio test does not produce ±1 or the magnitudes are not identical, we can conclude that 
this check node is a multi-ton. 


• single-ton bin: consider the single-ton check node Ui [01]. The underlying node is a single-ton if | f/i o [01] | = 
|^i,i[01]| = |[/i,2[01]| = |fli,3[01]| = |t/i^4[01]|, or namely the ratio test produces all ±1. Then, the index 
k = [/c[l], k[2], /c[3], A:[4]]^ of a single-ton can be obtained by a simple ratio test 


(_!)%] 

(_!)%] 


t^i,i[01] 


Uifl 

[01] 

Ul,2 

[01] 

Ui,o 

01 

Ul,3 

;oi; 

Ui,o 

01 

Ul,4 

;oi; 


Ui,o[Ol] 


(- 1)0 

(- 1 )' 

(- 1)0 

(- 1)0 


k[l] = 0 
k[2] = 1 
k[3] = 0 
k[A] = 0 

[A[k] = 17i,o[01] 


Both the ratio test and the magnitude constraints are easy to verify for all check nodes such that the index- 
value pair is obtained for peeling. 

This simple example shows how the problem of recovering the iT-sparse coefficients X[k] can be cast as 
an instance of oracle-based peeling decoding by proper subsampling-induced sparse bipartite graphs in the dual 
domain. It further shows that the freedom in choosing offsets d gets rid of the oracle by bin detection. However, 
this simple example will not work in the presence of noise. The key idea of our design is that by carefully choosing 
the offsets d and subsampling patterns M through a sparse-graph coding lens, we can induce “peeling-friendly” 
sparse bipartite graphs that lead to fast recovery of the unknown WHT coefficients even in the presence of noise, 
as illustrated next. 



4 The SPRIGHT Framework: General Architecture and Algorithm 


In this section, we generalize the simple example and present the our proposed SPRIGHT framework. Our frame¬ 
work consists of an observation generator and a reconstruction engine, as shown in Fig. 
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Figure 2: The conceptual diagram of our learning framework with C subsampling groups, where each group generates P 
basic query sets, each of size B = 2^. 


4.1 Observation Generator: Subsampling and Aliasing 

In our SPRIGHT framework, the observations are obtained from C subsampling groups, where each group gener¬ 
ates P basic observation sets of size B = 2^. Each group uses a different matrix Me G and a different set of 
P offsets de p G F 2 for p G [P], as summarized in Algorithm 

Algorithm 1 Subsampling and WHT 
Input : u[m] for m G F 2 with N = 2"'; 

Set : the number of subsampling groups C; observation set size B and number of observation sets P. 
Generate : offsets de p for p G [P]; subsampling matrix Me G F^^^ for some b > 0 

for c = 1 to C do 
for p = 1 to P do 

Uejj] = y/f U[^ci + de,p](-l)(^-’^). 

end for 
end for 


Proposition 1 (Basic Observation Model). The B-point WHT coefficients indexed by j G F 2 can be written as: 

UcAj]= E + p^[P], (10) 

Mrk=i 

where VFe,p[jf] = EM^k=j ll^[k](— <^nd FF[k] is the WHT coefficient of noise samples r(;[m]. 

Clearly, the j-th WHT coefficient Pc,p[j] in each observation set is an aliased version (hash output) of the 
Walsh spectral coefficient X [k] under the hash function Pe : F 2 —)• F^ in the c-th group 

j = Pe(k) = M^k, CG[C]. (11) 


It can be observed that the aliasing pattern (hash function) is invariant with respect to the offsets dc p used in 
subsampling. Similar to the bin observation vector in the simple example from Section [3.2.2[ we can regroup the 
observations Ur 


'cAj] according to the hash TLc{j) 

Uc[j] ^ 


) Pc,p[j]) ' 


( 12 ) 
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by stacking the j-th WHT coefficient associated with all the offsets across the P observation sets in a vector. 

Proposition 2 (Bin Observation Model). Given the offset matrix Dc := [• • • ; dc,p; • • • ] G the j-th bin 

observation vector in the c-th group can be written as 


Uc[i] = X[k](-l)°^*^ + We[j], j G F^, c G [C], 

k; MTk=j 


(13) 


where (— 1 )^') is the element-wise exponentiation operator and Wc[jf] = X]]v[^k=j )^[k ](—is the noise 
vector with W\ff\ being the WHT coefficient of the noise 


Proof. The proof follows from WHT properties similar to that in Q, and hence is omitted here. 


□ 


From a coding-theoretic perspective, the observation vectors Uc[jf] for j G F 2 across different groups c G [C] 
constitute the parity constraints of the coefficients X[k], where X[k] enters the j-th parity of group c if M^k = j. 
It can be shown that if the set size B = 2^ and the number of subsampling groups C are chosen properly, the bin 
observation vectors constitute parities of good error-correcting codes. Therefore, the coefficients can be uncovered 


iteratively in the spirit of peeling decoding (see Section 4.2 1 , similar to that in LDPC codes. The key idea is to 
to avoid excessive aliasing by maintaining B on par with the sparsity 0{K) and imposing P = 0(log A^) for 
denoising purposes in bin detection. To keep our discussions focus ed, w e defer the specific consfrucfions of fhe 
subsampling model in terms o^C, B = 2^ and {Mc}cg[c] Secfion 


4.3 


4.2 Reconstruction Engine: Peeling Decoder 

The oufpufs from fhe subsampling operafion are fhen used for reconsfrucfion. As sfafed in Proposifion]^ each bin 
observation vecfor consisfs of linear combinations of fhe unknown WHT coefficienls, which can be characterized 
by a sparse bipartite graph consisfing of K leff nodes (variable nodes) and CB righf nodes (check nodes). 

Definition 1 (Random Graph Ensemble). For some redundancy parameter t] > 0 let B = rjK = 2^ for some 
b > 0. The graph ensemble Q {K, 77 , C, {Mc}cg[c]) consists of left C-regular sparse bipartite graphs where 

• there are K left nodes (variable nodes), each labeled by a distinct element from the support k G /C; 

• there are B = 2^ right nodes (check nodes) per group, each labeled by the bin index j G F 2 and assigned 
the bin observation vector Uc[j]; 

• each left node k has degree C and each edge is connected to a right node j in each group according to the 
hash function He : F2 —?• F2 given in ([ID- 

Based on our simple example in Secfion [T^ fhe unknown WHT coefficienls (i.e. variable nodes) can be recov¬ 
ered fhrough a peeling decoder over fhe graph ensemble Q{K,r],C, {Mc}cg[c])> summarized in Algorifhmj^ 
The key is fo disfinguish fhe observafions Uc[jf] and idenfify single-lon bins for peeling. 

In Algorilhm|^ we denofe fhe bin defection roufine 

ijj :R^ ^ (type, k,X[^) (14) 

which defermines fhe types of bin observafions: 

1 . Uc[j] is a zero-ton if Ihere does nol exisl Ai[k] 0 such lhaf M^k = j, denoted by Uc[j] ~ Hz', 

2. is a single-ton wilh fhe index-value pair (k, X[kl) if Ihere exisfs only one Xikl 7 ^ 0 such lhaf Mjk = 

j,<ll,edbyU.b'l~Ws(k,X|k|); 

3. Uc[j] is a multi-ton if Ihere exisl more lhan one 2f [k] 7 ^ 0 such lhaf M^k = j, denoted by Uc[j] ~ Hm- 
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Algorithm 2 Peeling Decoder 

Input : observation vectors Uc[j] for ji G F^, c G [C]; 

Set : the number of peeling iterations /; 

for i = 1 to / do 
for c = 1 to C do 
for j G F 2 do 

(type,k,X[k]) ='!/’(Uc[j]). 
if type = single-ton then 

Peel off for allp = [P],c' =^C'] 

Locate bin index j^i = M^k 

Uc'AJc'] ^ 

else if type 7 ^ single-ton then 
continue to next j. 

end if 
end for 
end for 
end for 


Bin Detection Routine ip : —)• (type, k, X[k]) 


Zero-ton Verification: 

Uc,p[j] — 0 ? Vp = 1 , ■ ■ ■ , n 


I 


No 


Single-ton Verification: 

^^ = ± 1 ? Vp = l.---, 

Uc,o\j] 


I 


Yes 




Single-ton Search: 

*:[p] = sgii j Vp=l,-",n 

X[k] = U,,olj] 



Zero-ton Bin 

VM ~ Hz 


Multi-ton Bin 


Single-ton Bin 

u,y]~Hs(kY|k]) 


Figure 3; The bin detection routine i/) : (type, k, Y[k]) 

for the noiseless setting by choosing offsets = I„xn- 


We focus on the noiseless case here (generalization of the simple example), and then elaborate on robust bin 
detection in the presence of noise in Section The noiseless bin detection requires P = n offsets through the 
steps summarized in Fig.|^ 

• Uc[j] ~ "Hz if Uc,p[j] = 0 for all p = 1, • • • , n. 

• Uc[j] ~ if |C/'c,p[j]/C/c,o[j]| / ±lforallp = I,-- - ,n. 

• U,[j] rsj Tis (k, X [k] ) if the bin is neither a zero-ton nor a multi-ton. 

The index-value pair (k,X[k]) of the single-ton is obtained as follows. Since each single-ton bin observation 
satisfies Uc,p[j] = X[k](— the corresponding sigrj^ satisfies 

sgn [Uc,p[j]] = (dc,p, k) © sgn [X[k]] , (15) 


where sgn [Ai[k]] is the nuisance unknown sign. How do we get rid of such nuisance? This can be done by 
imposing a reference dc,o = 0 in addition to the offset matrix Dc G F^^” such that sgn [C/c,o] = sgn [X[k]]. 

This gives us a set of linear equations with respect to the unknown index k: 


sgn [C/'c,i[j]] ©sgn [Ucfl[j]] 
sgn [^ 70 , 2 ^]] ©sgn [Uc,o[j]] 


sgn [C/c,n[j]] © sgn [Ucfi[j]] 


(16) 


Clearly, if we choose the offsets in each group as Dc = Inxn> the unknown index k can be obtained directly from 
the signs of the observations. Finally, the value of the coefficient is obtained as X[k] = f 7 c,o[j]- 


4.3 Subsampling Design and Algorithm Guarantees 

With the general subsampling architecture given in Section |4T| we discuss the specific constructions of the graph 
ensemble G{K, ??, C, {Mc}cg[c]) hy choosing appropriately the observation set size B = 2 ^, the number of sub¬ 
sampling groups C, and the subsampling matrices {Mc}cg[c]- We defer the discussion of how to choose offsets 
dc,p to Section [^because its design is independent of the graph ensemble. 

^Note that the definition of the sign function here is a bit different than usual, where sgn [x] = 1 if a; < 0 and sgn [x] = 0 if a; > 0. 
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Let us first give some high level intuition of our 
subsampling design. Regardless of how many observa¬ 
tion sets P are generated in each subsampling group 
c G [C], it is desirable to keep the number of sub¬ 
sampling groups C and the observation set size B = 

2 ^ small such that the resulting sample complexity is 
small. However, if C and B are too small, the resulting 
observation bins will end up mostly with multi-tons so 
the peeling operations get stuck. As a result, the sub¬ 
sampling design is about finding fhe “sweef spof” for 
fhe number of subsampling groups C and fhe obser- 
vafion sef size B. In our analysis, we show fhaf fhe 
producf safisfies CB = 0{K), which implies fhaf fhe 
subsampling using our generafor does nol infroduce ex- 
fra overheads ofher fhan a consfanf facfor compared fo 
fhe sparsify K. More imporfanfly, from our analysis, 
such consfanf can be made explicif given fhe number 
of subsampling groups C. 

The subsampling design varies wifh fhe sparsify regime 0 < 5 < 1 and hence, our resulfs are sfafed wifh 
respecf fo differenf infervals of S fhaf cover fhe entire sparsify regime (see Appendix]^. Our resulfs sfafed below 
presenfs one consfrucfive scheme using fhe parfitior[^(0, 1/3]U(1/3, 0.73]U (0.73, 7/8]U (7/8,0.99]. The sampling 
overhead (i.e. CB/K) infroduced by fhe observation generator using fhis partition is shown in Fig. This is by 
no means fhe unique scheme and fhe reason for choosing 1/3, 0.73, 7/8 and 0.99 as break poinfs is fhaf we wanf 
fo keep fhe number of infervals small for fhe sake of presenfafion, since each interval resulfs in a differenf design. 



Figure 4; The ratio of the total bin number to the sparsity r = 
CB jK as a function of the index 5 G (0, 0.99). 


Theorem 2 (Oracle-based Peeling Decoder Performance). Consider an input vector with a K-sparse WHT 
such that K = 0{N^) for some 0 < J < 1. Given an observation generator with C subsampling groups and an 
observation set size B = pK for some p > 0, the subsampling-induced graph ensemble Q{K, p, C, {Mc}ce[c]) 
guarantees that with probability at least 1 — 0{1/K), the oracle-based peeling decoder recovers all K unknown 
coefficients in time 0{K) as long as 


• C 

• C 

• C 

• C 


3 subsampling groups and B > 
6 subsampling groups and B > 
8 subsampling groups and B > 
8 subsampling groups and B > 


0A073K for 0 <6 <1/3 (see Section 


0.2616A:/or 1/3 < 5 < 0.73 (see Section B3.2); 


0.2336K for 0.73 < <5 < 0.875 (see Section B.3.3); 


0.2336K for 0.875 < <5 < 0.99 (see Section B.3.4). 


Proof Our analysis is similar to the arguments in | [2T|[22| using the so-called density evolution analysis from 
modern coding theory, which tracks the average densitjjj of the remaining edges in the graph at each peeling 
iteration of the algorithm. Although the proof techniques are similar to those from | [2T| and p2| , the graph used 
in our peeling decoder is different from those in 221. This leads to fairly important differences in the analysis, 
such as the degree distributions of the graphs and the expansion properties of the graphs (see Appendix]^. Hence, 
we present an independent analysis here for our peeling decoder. In the following, we provide a brief outline of 
the proof elements highlighting the main technical components. 


• Density evolution in Lemma We analyze the performance of our peeling decoder over a typical graph 
(i.e., cycle-free) of the ensemble Q{K,p,C,{ls/lc}c&[c]) for ^ fixed number of peeling iterations i. We 

^We choose to cover the regime 0 < 5 < 0.99 for the sake of presentation, and one can follow our proof in Appendix]^ to design 
subsampling patterns for 5 > 0.99. 

"'The density here refers to fraction of the remaining edges, or namely, the number of remaining edges divided by the total number of 
edges in the graph. 
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assume that a local neighborhood of every edge in the graph is cycle-free (tree-like) and derive a recursive 
equation that represents the average density of remaining edges in the graph at iteration i. The recursive 
equation guarantees that the average density is shrinking as the iterations proceed, as long as the redundancy 
parameter r] is chosen accordingly with respect to the number of groups C for subsampling. 

• Convergence to density evolution in Lemma[^ Using a Doob martingale argument p2| and [2^ , we show 
that the local neighborhood of most edges of a random graph from the ensemble Q{K, r], C, {Mc}cg[c]) 
cycle-free with high probability. This proves that with high probability, our peeling decoder removes all but 
an arbitrarily small fraction of the edges in the graph (i.e., the left nodes are removed at the same time after 
being decoded) in a constant number of iterations i. 

• Graph expansion property for complete decoding in Lemma We show that if the sub-graph consisting of 
the remaining edges is an “expander” (as will be defined later in this section), and if our peeling decoder 
successfully removes all but a sufficiently small fraction of the left nodes from the graph, then it removes 
all the remaining edges of the graph successfully. As long as the number of subsampling groups C is large 
enough for a given sparsity <5, we show that our graph ensemble is an expander with high probability. This 
completes the decoding of all the non-zero WHT coefficients. 


□ 


5 Robust Bin Detection 


We have shown in Section 4.3 that given an oracle for bin detection, our subsampling design for any sparsity 
regime 0 < (5 < 1 guarantees that peeling decoder successfully recovers all unknown WHT coefficients in the 
absence of noise. In the noisy scenario, it is critical to robustify the bin detection scheme by choosing subsampling 
offsets differently than the noiseless setting. In the following, we explain the robust bin detection routine For 
simplicity, we drop the group index c and bin index j when we mention some bin observation. For example, the 


observation vector of some bin j from group c is denoted by U = 
offsets is D = [di; • • • ; dp] G 






where the associated set of 


5.1 Performance Guarantees of Robust Bin Detection 


From the noiseless design given in Section |4.2 
we can see that the offset signature (—1) 


Dk 


as¬ 


sociated with each coefficient in Proposition]^ is 
the key to decode the unknown index-value pair 


(k, X[k]) of a single-ton. Let S = [• • • , Sk, • • • ], 
where for each k G F 2 we denote by 


Sk = (-1)°*^ (IV) 


the offset signature codebook associated with the 
offset matrix D. Then in the presence of noise, 
the bin observation vector can be written as 


BPSK symbols 
codeword from S 


unknown 

channel gain 






noise 

w 



decoded index-value pair 
(£,Xl£]) 


bin observation 



u 





Robust Bin 

Detection 



U = Sa + W (18) 


Figure 5: An illustration of a single-ton detection. 


for some sparse vector a = [• • • , a[k], • • • ]^ such that a[k] = [k] if M^k = j and a[k] =0 if otherwise. 

Clearly, the sparsity of a implies the type of the bin. For example, the underlying bin is a single-ton if it is 1- 
sparse. It can be further shown from Q that W follows a multivariate Gaussian distribution with zero mean and a 
covariance E [LFFU^j = and := Na"^/B. 
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In the case of single-tons, the observation U can be regarded as the noise-corrupted version of some codeword 
from the codebook S (see Fig.j^. In our noiseless design, each codeword G {—1,1}"^ encodes the n-bit index k 
into n binary phase-shift keying (BPSK) symbols (—g {±1} for p G [n]. This set of n BPSK symbols is 
scaled by the coefficient X [k] and observed as Up for p G [n]. This resembles the communication scenario where 
the goal of a receiver is to decode a sequence of n BPSK sequence with an unknown channel gain. Therefore, when 
there is additive noise in the channel, the codebook needs to be re-designed such that it can be robustly decoded. 

In general, the vector o: is not necessarily 1-sparse (multi-ton bin). Through the robust bin detection scheme, 
we can effectively detect out the bins carrying some 1 -sparse ex (i.e. single-tons), and recovers the index-value pair 
of the 1-sparse coefficient. Then, as the peeling operations proceed, the non-zero coefficients in other bins carrying 
(X that is not 1-sparse will be peeled off, which keeps forming new bins carrying 1-sparse vectors (single-ton). 

In particular, we first present a straightforward design for near-linear time detection to shed some preliminary 
light on the noisy design, and then proceed to our proposed sub-linear time detection schemes. More specifically, 
we have two sub-linear time detection schemes that impose different sample complexities and computational com¬ 
plexities, called the Sample-Optimal (SO)-SPRIGHT algorithm and the Near Sample-Optimal (NSO)-SPRIGHT 
algorithm respectively. 

Theorem 3. Given the ojfsets D G chosen by 

• Definition ^for the near-linear time detection scheme, or 

• Definition^^or the NSO-SPRIGHTalgorithm and Definition^for the SO-SPRIGHT algorithm, 
the failure probability Fi;’ of the peeling decoder in the presence of noise is 0{1/K). 

Proof See Appendix [D| □ 

5.2 Near-linear Time Robust Bin Detection: A Random Design 

The near-linear time bin detection scheme follows the principle of using random codes to resolve the different bin 
hypotheses and obtain the index-value pair. 

Definition 2. Let P = O(logiV). The near-linear time detection scheme requires P random offsets {dp}pg[p] 
chosen independently and uniformly at random over in every group. 

For some 7 G (0,1), the near-linear time detection routine is performed as follows: 

11 2 

• zero-ton verification', for zero-tons, we can expect the energy ||17|| to be small relative to the energy of a 
single-ton. Therefore, this idea is used to eliminate zero-tons: 

if ^||C/f < + (19) 

• single-ton search', after ruling out zero-tons and multi-tons, the ultimate goal is to identify single-tons in a 
certain group c in terms of the underlying index k and the value A[k] in that hash set {k : T(c(k) = j}- 
Therefore, assuming that the underlying bin j is a single-ton bin, we perform a single-ton search to estimate 
the pair of estimates (k, X [k]) for peeling. To do so, we employ a Maximum Likelihood Estimate (MLE) 
test. Eor each of N/B possible coefficient locations k in Mjk = j, we obtain the single-ton coefficient as 

S[k] = ^s^U, Vk such that M^k = j. (20) 

Using the MEE of the coefficient, we choose among the locations by finding the location k which minimizes 
the residual energy: 

k = argmin ||[/— Q[k]sk||^ . (21) 

k 
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With the estimated index k, the value of the coefficient is obtained as 


X[k] 


p, if slU/P > 0 
-p, ifs^[//P<0. 


( 22 ) 


• single-ton verification', this step confirms if the bin is a single-ton via a residual test using the single-ton 
search estimates 


1 

P 


U - X[k] 



< (1 + 


(23) 


Since there are a total of rjK bins in each of the C subsampling groups and each bin has P = 0{logN) 
measurements, the SPRIGHT framework using the near-linear time detection scheme leads to a sample cost of 
M = CrjKP = 0{K\ogN). In terms of complexity, solving the above minimizations requires an exhaustive 
search over all indices MJ k = j for some bin j G F^. This leads to an exhaustive search over 0{N/K) elements 
on average in each peeling iteration, where each element imposes a search complexity of P = O(logiV) by the 
generalized likelihood ratio test. As a result, across all 0{K) peeling iterations, this results in a total complexity 
of T = 0{N/K) X O(logAr) X 0{K) = 0{N log N). 


5.3 Sub-linear Time Robust Bin Detection 


Inspired by the near-linear time bin detection scheme, we devise two simple schemes to achieve the same perfor¬ 
mance with sub-linear time complexity. Recall that the robust bin detection involves three steps: 

1 ) zero-ton verification ^ \\U\\^ < {1-\- 

2 ) single-ton search that estimates the index-value pair (k, A[k]); 

_2 

3) single-ton verification p —2f[k]sg <(l+7 )z^^. 


The near-linear time design is a straightforward construction of the offset matrix D G to guarantee success 

for step (1) and step (3). However, it does not optimize its choice of offsets to facilitate step (2) in the noisy setting, 
which causes the high complexity. 

To avoid the joint estimation and detection approach in the near-linear time scheme, we use different offsets to 
tackle them separately. We perform the single-ton search using some offsets, while using other offsets for zero-ton 
and single-ton verifications. Since the fully random offsets already tackle the verifications with high probability, 
we can simply focus on designing offsets for the single-ton search. If the single-ton search can be performed with 
high probability of success using the same amount of samples and computations (in an order-sense), the entire bin 
detection scheme becomes sub-linear, as discussed in details below. 


Proposition 3. Given a single-ton bin with (k, A [k]) observed in noise 

[/p = A[k](-i)<d->^) + iyp, PG[P], 


(24) 


the sign of each observation satisfies 

sgn [Up] = (dp, k) © sgn [X[k]] © Zp, p G [P], (25) 

where Zp is a Bernoulli random variable with probability upper bounded as Fe = e“ 2 ®^^. 

Proof. See Appendix [C| □ 
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From Proposition it can be seen that the 
sign vector of the bin observation vector U can 
be viewed as some potentially corrupted bits re¬ 
ceived over a binary symmetric channel (BSC). 
The design of the offset matrix D for reliable 
and fast decoding over the BSC is thus the key 
to achieving sub-linear complexity. 

In the following, we first present the sub- 
linear time NSO-SPRIGHTAlgorithm that is easy 
to implement (i.e. a majority vote) and achieves 
a sub-linear complexity T = 0{K\o^ N) with 
a sample cost of M = 0{K\o^ N). Then, 
we present the sub-linear time SO-SPRIGHT Al¬ 
gorithm that maintains the optimal sample cost 
M = 0{K\ogN) and simultaneously achieves 
sub-linear complexity T = 0{K log N) using an 
iterative channel decoder. 

5.3.1 The NSO-SPRIGHT Algorithm 


Zero-ton Verification: 

Up < (l-\- 7)^^? Vp = 1 , ■ ■ ■ 

I No 


Yes 


Single-ton Search: 

NSO-SPRIGHT (majority vote) 
SO-SPRIGHT (iterative decoding) 


^ (k,V[k]) 


Single-ton Verification: 

||t 7 -X[k]sj||" < (1+7)1"" 



Zero-ton Bin 

Vr^Hz 


Single-ton Bin 

U--Hs{k,X[k]) 


Multi-ton Bin 
U-Hm 


Figure 6; A simplified flowchart of the bin detection routine ^ for 
the noisy setting by choosing offsets according to Theoremj^for the 
NSO-SPRIGHT and the SO-SPRIGHT algorithm. 


Recall that the near-linear time design requires an exhaustive search due to the lack of structure of fully random 
offsets, which creates a bottleneck of the complexity. The key is to design a set of offsets that constitute a suf¬ 
ficiently good codebook to allow reliable transmissions of the n-bit index k over a BSC. In order to enable the 
bit-by-bit recovery of the binary representation of k as in the noiseless design, the first coding strategy we exploit 
is repetition coding, which is done by imposing structures on the random offsets for subsampling. 

Definition 3. Let P = P 1 P 2 with Pi = 0{n) and P 2 = n. The NSO-SPRIGHT algorithm requires Pi random 
offsets {dpipgjpj] chosen independently and uniformly over ¥2 and P 2 modulated offsets such that 


lp,(j © dp — Gq, q ^ [©2] 


(26) 


where Gq is the q-th column of the identity matrix. 

Given the offsets chosen as Definitionj^ we can identify the g-th bit of k by jointly considering Pi observations 
associated with offsets dp g across p G [Pi]. More specifically, 

sgn [Up^q] = (dp,q, k) © sgn [X[k]] © Zp^q (27) 

sgn [Up] = (dp, k) © sgn [X[k]] © Zp. (28) 

Since dp g © dp = Gq, we have Pi corrupted versions of k[q]: 

sgn [Pp,g] © sgn [Pp] = {Gq, k) © Z'p q = k[q]® Z^ q, (29) 

where Zpq = Zp® Zp^q is another Bernoulli variable with 0 = Pr {Z'^q = l) = 2Pe(l — Pe) < 1/2. Then the 
MLE of k[q] given observations {sgn [Pp,g] © sgn \Up]}pL^ can be obtained as 


Pi 

k[q] = argmaxTT - 0ji-sgn[t/p.9]esgn[t/p]©a_ 

p=l 

Using the fact that 0 < lj2 such that log(0/l — 0) < 0, we can simplify the objective as 

Pi 

k[q] = arg min V sgn [Up^q] © sgn [Up] © a. 
aSF2 ^ ^ 
p=l 


(30) 


(31) 
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In other words, the decoding scheme for the g-th bit of the index k becomes a simple majority test by accumulating 
Pi = 0{n) random signs sgn [Up^g] © sgn [Up]. Using the estimated bits {k[q]} together with M^k = j, the 
estimate k can be obtained accordingly. Finally, the value of the coefficient is obtained as ( |2^ . The zero-ton and 
single-ton verifications can be performed directly using the measurements associated with offsets dp since there 
are Pi = 0{n) = O(log At) such random offsets, which have been shown to achieve high probability of success 
in the near-linear time design. 

From Definition we can see that there are a total of P 1 P 2 = 0{ti?) offsets, and therefore each bin has 
0(log^ N) observations. As a result, the NSO-SPRIGHT algorithm leads to a sample cost of M = Cr]K log^ N = 
0{K log^ N). In terms of complexity, the majority vote requires 0(log^ N) operations for each bin, contributing to 
a total of 0{K log^ N) operations across all 0{K) bins. However, this complexity is dominated by generating P = 
P 1 P 2 = log^ N basic observation sets from H-point WHTs, each imposing an extra complexity of 0[K log K) = 
0{K log N) because of K = 0{N^). As a result, this gives a total complexity of T = 0{K log^ N). 

5.3.2 The SO-SPRIGHT Algorithm 

While the NSO-SPRIGHT algorithm exploits repetition codes induced by the random offsets to robustify the noisy 
performance, we can further use better error correction codes to guide the choice of offsets. This is slightly more 
difficult to implement in practice since the decoding requires channel decoder instead of a simple majority vote, 
but the resulting sample complexity and computational complexity are order-optimal. 

Definition 4. Let P = ~ 0{n) for i = 1, 2, 3. The SO-SPRIGHT algorithm requires Pi random 

ojfsets dp for p = 1 , • • • , Pi chosen independently and uniformly at random over F^, and P 2 zero ojfsets dp = 0 
for p = Pi + 1, • • • , Pi + P2, and finally P3 coded ojfsets dp for p = Pi + P2 + 1, • • • , P such that the offset 
matrix G = [••• ;dp;-.. ;] E jppxn generator matrix of some linear block code with a minimum 

distance j3P^ with j3 > Fe- 

Recall Proposition!^ the observations associated with the coded offsets G can be written as 


sgn [Up^+p^+i] 


ZPi+P2 + l 


= Gk © sgn [X[k]] © 


sgn [Up] 




Note that there is a nuisance sign sgn [Al[k]] which is unknown to the robust bin detector. To illustrate our scheme, 
we first assume that there is a genie that informs the decoder of the sign of the coefficient sgn [X[k]], and then we 
discuss how to get rid of the genie. 

• when sgn [X[k]] is known a priori: in this case, we can easily obtain 


Sgn [Up^+P 2 +i] © sgn [X[k]] 


^Pl+P2-\-l 


= Gk© 


sgn [Pp] © sgn [X[k]] 


ZP 


Since there are n information bits in the index k, then there exists some channel code (i.e. G) with block 
length P 3 = n/R{j5) that achieves a minimum distance of / 3 P 3 , where P(/3) is the rate of the code. As long 
as /3 > Fg! it is obvious that the unknown k can be decoded with exponentially decaying probability of error. 
There exist many codes that satisfy the minimum distance properties, but the concern is the decoding time. It 
is desirable to have decoding time linear in the block length so that the sample complexity and computational 
complexity can be maintained at 0(n), same as the noiseless case. Excellent examples include the class of 
expander codes or LDPC codes that allow for linear time decoding. 
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when sgn [X[k]] is not known a priori: we consider the observations associated with all the zero offsets 
dp = 0 for p = Pi + 1 , • • • , Pi + P 2 


sgn [Ppi+i] 


Zpi+i 


= sgn [X[k]] © 


.sgn [Up^+P^l 


ZPl+P2_ 


(34) 


which can recover the sign correctly sgh [3f[k]] = sgn [X[k]] with high probability using a majority test 
(assuming Pg < 1/2). If Pe > 1/2, the sign is obtained accordingly using a minority test. Then we can 
proceed as if the sign is known a priori: 


Sgn [Up^+p^+i] © sgh [X[k]] 


ZPi-VP2-Vl 


= Gk© 


sgn [Pp] © s'gh [X[k]] 


Zp 


(35) 


Finally, the value of the coefficient is obtained as ( |22| ). The zero-ton and single-ton verifications can be performed 
directly using the observations associated with offsets dp. Since there are Pi = 0{n) = 0{logN) such random 
offsets, which have been shown to achieve high probability of success in the near-linear time design. 

Using the SO-SPRIGHT design, we can see that there are three sets of offsets, where one set includes P 3 = 
0 (n) offsets for the single-ton search, and the second set includes P 2 = 0 (n) zero offsets for the sign reference, 
and Pi = 0(n) random offsets for the zero-ton and single-ton verifications. Therefore, we have a total of P = 
Pi = 0{n) = O(logiV) offsets and each bin has 0(log Af) observations. As a result, the SO-SPRIGHT 
algorithm leads to a sample cost of M = CrjKP = 0(iT log A^), which is the same as the noiseless case |[^. In 
terms of complexity, if G is a properly chosen channel code generator matrix from the class of expander codes or 
LPDC codes, the decoding time for the index requires 0{n) = 0(log N) operations for each bin. This contributes 
to a total of 0{K log N) complexity across all 0{K) bins. However, this complexity is dominated by subsampling 
for generating P basic observation sets from P-point WHTs, each imposing an extra complexity of 0{K log K) = 
0{K log N) because of K = 0{N^). As a result, this gives a total complexity of T = 0(K log^ N), which is 
also the same as the noiseless case Q. 


6 Applications 

In the following, we provide some machine learning concepts that can be cast as a WHT computation or expansion. 

Example 2 (Pseudo-Boolean Function and Sparse Polynomial). An arbitrary pseudo-Boolean function can be 
represented uniquely by a multi-linear polynomial over the hypercube (zi, • • • , Zn) G {~1, +1}”^.' 

f{zi, ■■■ ,Zn)= ^ 05 n P, V Zj E {-1, +1}, (36) 

(5C[n.] i^S 

where S is a subset of [n] := {1, • • • , n}, and as is the Walsh (Fourier) coefficient associated with the monomial 
P. If we replace Zi by (—1 )”*W such that Zi = —1 when m[i] = 1 and Zi = 1 when m[i\ = 0, we have 
a:[m] = / ((—1)"*[^1, • • • , (— I)™!""!)/or m E F 2 and X[k] = s/Nas ^uch that supp (k) = S. 

Example 3 (Set Functions). A set function is an arbitrary real-valued function f : —)• M defined for every 

element in the power set Z E which has a Walsh expansion given by 

= ^ E (37) 

5e2W 
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where f{S) is the Walsh (Fourier) coefficient. Clearly, a set function can also be viewed as a n-ary pseudo-Boolean 
function in ( |36| ) such that f{Z) = f{zi, ■ ■ ■ ,Zn) as long as Zi = —1 ifi G Z and Zi = 1 if i ^ Z. Therefore, 
each function value f{Z) can be regarded as a sample x[m] = /(supp (m)), where the Walsh coefficient satisfies 
X[k] = f{S) as long as supp (k) = S. 

Example 4 (Decision Tree Learning). Decision trees are machine-learning methods for constructing prediction 
models from data, whose goal is to predict the value of a target label f based on n input variables Zi G {il} 
for i G [n]. More specifically, this includes classification trees (discrete-valued outcome / G Zj and regression 
trees (real-valued outcome f G M). Decision tree models are usually constructed from top-down starting at the 
root node, by choosing a certain variable Zi for some i at each step that optimally splits the set of training data 
with respect to some measure of goodness. Hence, for each set of input variables (zi,-■ ■ ,Zn) G {— there 
is a unique leaf node in the tree that assigns the target label f. This is mathematically equivalent to learning a 
(pseudo)-Boolean function, which can be cast as a problem of computing the WHT of f. 

It has been found that many instances of the examples above exhibit sparsity in the Walsh spectrum. In general, 
our SPRIGHT framework can be applied to learning iT-sparse pseudo-Boolean polynomials / : {±1}” —M with 
n variables. A concrete example is in decision tree learning, where the underlying (pseudo)-Boolean function 
has a sparse spectrum if the decision tree has few leaf nodes with short depth. An extreme case would be when 
the underlying function only depends on few input variables, which is also referred to as the juntas problem in 
Boolean analysi^ Therefore, if the AT-sparse A^-point WHT can be computed efficiently, these machine learning 
applications can benefit greatly from the reductions in both the sample complexity and computational complexity. 
In the following, we present a specific machine learning application in graph skefching. 


6.1 Applications in Hypergraph Sketching 


A hypergraph, denofed by ^ = {V,8), is a generalized notion of graphs where each edge e ^ £, called fhe 
hyperedges, can connecf more fhan fwo nodes in fhe node sef V. Hypergraph skefching here refers fo fhe procedure 
of identifying fhe unknown hypergraph sfrucfure from cuf queries. Hypergraphs have been very useful in relafional 
learning, which has received exfensive affenfions in recenf years since many real-world dafa are organized by fhe 
relations befween enfifies. Some of fhe inferesfing problems involved in relafional learning include fhe discovery 
of communifies, classificafion, and predicfions of possible new relafions. 

We describe fhe hypergraph skefching applicafion fhrough an example depicfed in Fig. [7] Consider a scenario 
where fhere are n books from a cerfain provider (e.g. Amazon) and each book is characterized by a node in fhe 
graph. There are numerous transacfions faking place in which each cusfomer buys a few books. In fhis setting, 
fhe relafionship befween books in each fransacfion can be capfured by a hyperedge, which connecfs fhe subsef of 
books boughf in fhe same fransacfion. A carfoon illusfrafion is depicfed in Fig. [7] where fhere are 3 disfincf sefs 
of books boughf in differenf fransacfions wifh each sef coded in differenf colors. Then, fhe hypergraph skefching 
problem is equivalenf fo solving fhe following problem under a fhe following query model: 


Pick an arbifrary partition (S, S) of n books such fhaf S L) S = V (see Fig. 7(b) i. 

One can query fhe following: i) are fhere any fransacfions fhaf include books from bofh sefs (5, S)7 and ii) 
if fhere are, whaf is fhe fofal number of fransacfions fhaf satisfy fhis requiremenf? For example in Fig. 7(c)[ 
fhe resulfing query would refurn 1 since fhere is only 1 fransacfion fhaf includes books from bofh sefs. 


• How many such queries are needed fo fully learn all fhe unknown disfincf subsefs of books fhaf are boughf 
in differenf fransacfions? 


Nofe fhaf fhe query requested here is in facf fhe number of hyperedges fhaf cross over fhe fwo sefs (S, S), which 
is defined as fhe cut value of fhe graph. As shown nexf, fhis can be mafhemafically esfablished as a sparse WHT 
compufafion problem, where our SPRIGHT framework is found fo be useful. 

^It is well-known that learning juntas using random samples is NP-hard. Our framework tackles the juntas problem using specifically 
chosen samples, and hence we can achieve sub-linear sample cost and run-time. This is not a contradiction. 
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(a) Hidden graph of n books: there are a 
few purchase patterns, where each corre¬ 
sponds to a hyperedge 





(b) Pick some partition {S, S)'. how many 
transactions include books from both sets 
(S, S)? 





w 




(c) Query: in this example, the query result 
for this partition is 1 and the graph has 3 
distinct subsets. 


Figure 7: Given a set of n books, infer the graph structure by querying graph cuts. 


Let |V| = n and \E\ = s. A cut 5 C V is a set of selected vertices, denoted by the binary n-tuple m = 
• • • , m[n]] over F^, where m[i] = 1 if i G 5 and m[i] = 0 if i ^ 5. The cut value x[m] for a specific cut 
m in the hypergraph is defined as x[m] = |{eGF:en5/0, en5/0}|, where S = V/S. In ofher words, 
fhe cuf value corresponds fo fhe number of hyperedges fhaf crosses befween fhe fwo sefs (5, S). Given a partition 
m G F 2 , for some edge e G F, we define fhe following function fo indicafe whefher if crosses over fwo sefs (S, S): 


le[m] 


n 


(1 + (-1)”^W) 
2 


+n 


(1 - (-1)™W) 
2 


(38) 


For example, if all fhe nodes connecfed fhrough fhis particular hyperedge f G e is on fhe same side of fhe partition 
(5, S), which implies fhaf eifher m[i] = 0 or m[i\ = 1 for all i G e, fhis indicator le[m] = 1 is 1. This suggesfs 
fhaf when fhe edge e does not cross over fhe fwo sefs (5, S), fhe indicafor fakes fhe value 1. Therefore, fhe fofal 
counf of edges fhaf do cross over can be obfained accordingly as 


x|m] = ^ (1 - le[m]). 


(39) 


edS 


By subsfiluling le[m] wifh (381, if can be equivalenfly wriffen as a WHT expansion as follows: 

i[m] = X|k|(-1)(‘‘’”», 


(40) 


keF? 


where fhe coefficienf X[k] is a scaled WHT coefficienf such fhaf 3f[0] = — Yhe&E 2 ie|-i ^ 


X[k] = / 2R^’ ^ ® l®^PP 

10 , ofherwise 


(41) 


Clearly, if fhe number of hyperedges is small s <C 2" and fhe maximum size of each hyperedge is small, fhe 
coefficienls X[k]’s are sparse. For example, if fhe hyperedge size can be universally bounded by d, fhe sparsify 
can be well upper bounded by iT < 


6.2 Simple Experiment 

Here we consider fhe noiseless scenario as a proof of concepf, we use our SO-SPRIGHT algorifhm for hypergraph 
skefching, which requires 0{Kn) queries for interpolating fhe fofal 2” cuf values wifh run-fime 0{Kn‘^). In fhis 
experimenf, we randomly generafe hypergraphs wifh n = 50 to 400 nodes wifh s = 3,6,9 edges, where each 
edge does nol conned more fhan d = 6 nodes. As can be seen, our SPRIGHT framework compufes fhe sparse 
coefficienls A[k] in time 0(iT log Kn) = &{Kn?) from only Q{Kn) cuf queries. 
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SPRIGHT Hypergraph Sketching 



(a) Query cost scaling with the graph size n 


SPRIGHT Hypergraph Sketching 



(b) Run-time scaling with the graph size n 


7 Numerical Experiments 

In this section, we test the NSO-SPRIGHT algorithm and SO-SPRIGHT algorithm respectively. We first showcase 
the performances in many settings by varying the signal length N = 2"^, sparsity and SNR. Then, we demonstrate 
possible applications of our SPRIGHT framework in machine learning domains such as hypergraph sketching and 
decision tree learning over large datasets. 

7.1 Performance of the SPRIGHT Framework 

Here, we synthetically generate time domains samples x from a /f-sparse WHT signal X of length N = 2^ with 
K randomly positioned non-zero coefficients of magnitude ip. The setup of our experiments is given below: 

• subsampling parameters', we fix the number of groups to C = 3 and the number of bins in each group is 
B = 2^ where b = [log 2 (iT)]. Note that in this case B k, K and thus p k, 1. 

• NSO-SPRIGHT algorithm parameters', we choose Pi = 2n random offsets and P 2 = n modulated offsets. 
Thus the sample cost is Mnso = 2CBn^ ss GKv? and the complexity is Tnso = 0{Krfi). 

• SO-SPRIGHT algorithm parameters', we choose Pi = 2n coded offsets for the single-ton search, P 2 = n 
zero offsets and = n random offsets for the zero-ton and single-ton verifications. For the single-ton 
search, the Pi = 2n coded offsets are chosen to induce a (3, 6)-regular LDPC code, where the search 
utilizes the Gallager’s bit flipping algorithm for decoding, which imposes linear run-time 0{n). The sample 
cost is Mso = 4:CBn « 12Kn and the complexity is Tso = 0{K'n?). 

7.1.1 Noise Robustness 

In this subsection, we compare the noise robustness of the NSO-SPRIGHT and SO-SPRIGHT algorithms. The 
experiment settings are given below: 

• input profile', we generate a sparse WHT vector or length N = 2^ with n = 14 and K = 10, 20,40 non-zero 
coefficients respectively. Therefore, the signal dimension is N = 16384. The non-zero WHT coefficients 
are chosen with uniformly random support and random amplitudes {±1}. The input signal samples x is 
obtained by taking the inverse WHT of the sparse WHT vector and adding i.i.d. Gaussian noise samples 
with variance cr^ determined by the range of SNR = [—5 : 5 : 20] dB . 
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NSO-SPRIGHT Algorithm for N=16384 SO-SPRIGHT Algorithm for N=16384 




(c) Probability of success versus SNR 


(d) Probability of success versus SNR 


Note that the sample complexity of the NSO-SPRIGHT algorithm is approximately a factor of n more than the 
SO-SPRIGHT algorithm, and thus the recovery performance is better under the same experiment setup. However, 
this is due to our simple choice of (3, 6)-regular LDPC codes for inducing the offsets in the SO-SPRIGHT algo¬ 
rithm, which is far from capacity-achieving. Potentially one can use better LDPC code ensembles or even spatially 
coupled LDPC codes to provide better performance at the low SNR regime. Here the (3, 6)-regular ensemble is 
simply an example to showcase the algorithm. 

7.1.2 Sample Complexity and Run-time Performance 

In this subsection, we compare the sample complexity and run-time performance of the NSO-SPRIGHT and SO- 
SPRIGHT algorithms. The experiment settings are given below: 

• input profile', we generate a sparse WHT vector or length N = 2^ with K = 10, 20,40 non-zero coefficients 
respectively and vary n from n = 7 to n = 17. Therefore, the signal dimension spans from N = 128 Ri 10^ 
to 131072 Ri 0.1 X 10®. The non-zero WHT coefficients are chosen with uniformly random support and 
random amplitudes {±1}. The input signal samples x is obtained by taking the inverse WHT of the sparse 
WHT vector and adding i.i.d. Gaussian noise samples with variance determined by the SNR = 10 dB. 

• benchmark: as the signal length N = 2"^ varies, the algorithm parameters are fixed over 200 random experi- 
menfs. We record a dafa poinf only when fhe success probabilify exceeds 0.95. 


8 Conclusions 

In fhis paper, we have proposed fhe SPRIGHT framework fo compufe a 7f-sparse A^-poinf WHT, where fhe NSO- 
SPRIGHT algorifhm uses 0{K log^ N) samples and 0(iT log^ N) operafions while fhe SO-SPRIGHT algorifhm 
mainfains the optimal sample scaling 0{K log N) and complexity 0{Klog^ N) as that of the noiseless case. 
Our approach is based on strategic subsampling of the input noisy samples using a small set of randomly shifted 
patterns that are carefully designed, which achieves a vanishing failure probability. 
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Appendices 

A Proof of Theorem [1] 


From Theorem|^ it is shown that as long as C < 8 groups and B = 0{K), the oracle-based peeling decoder suc¬ 
ceeds with probability at least 1 — O(l/iF)for0 < 5 < 1. In Theorem]^ it is further shown that with the proposed 
bin detection routine using P observation sets (chosen differently) in each group, the peeling decoder continues 
to succeed with probability at least 1 — 0{1/K) in the presence of noise. Therefore, the sample complexity is 
M = CBP = 0{KP). On the other hand, the computational complexities stem from two sources: 

• The computation of i?-point WHTs for subsampling: there are P observations sets in each group, where 
each observation set requires a B-point WHT Thus the total complexity is 0{PB log B) = 0{PK log N), 
where K = 0{N^) has been used; 

• The bin detection routine in each peeling iteration for decoding: In the NSO-SPRIGHT scheme it is a 
majority vote, which leads to a complexity of 0{P). In the SO-SPRIGHT scheme it requires the decoding 
of a linear code formed by the P offsets. As mentioned, one can potentially use (spatially coupled) LDPC or 
expander codes to achieve linear-time decoding 0{P), where P is the block length of the code. Therefore, 
both sub-linear detection schemes result in a total complexity of 0{KP) throughout the 0{K) peeling 
iterations. 

Clearly, the complexity is dominated by the subsampling T = 0{PK log N). Substituting the corresponding P 
required by the sub-linear bin detection routines in the NSO-SPRIGHT and the SO-SPRIGHT schemes, we arrive 
at our stated results. 

B Proof of Theorem 1^ : Oracle-based Peeling Decoder Analysis 

B.l Design and Analysis for the Very Sparse Regime 0 < d < 1/3 

To keep our discussions general, we choose C subsampling groups and B = 2^ with b = 6n such that B = rjK 
for some 77 > 0 and the subsampling matrices 

= [Ofc-l)xb^^bxb:^'{n-cb)xb\'^ ^ CG[C'], (42) 

which freezes a (n — 6 )-bit segment of the time domain indices m G F 2 to all zero^ Then, each left node labeled 
k G F 2 is connected to a right node labeled j G F 2 determined by the aliasing pattern Mjk = j. Therefore, the 
graph ensemble G{K, rj, C, {Mc}cg[c]) in Definition [T| is consistent with the “balls-and-bins” model, where the 
k-th ball (i.e. left node k) is thrown to bin in group c. Now we show that given the uniform support 

distribution, the graph ensemble is further consistent with the random “balls-and-bins” model in each group. 

We divide the index k into C -|- 1 segments as k = [kf , fcj, • • • , where each of the first 

C segments kc = [fc[c6], • • • , A:[(c — l)b + 1]]^ for c G [C] contains b bits while the last segment kc+i = 
[k[n], ■ ■ ■ , A:[C'6-|-1]]^ contains the remaining (n — Cb) bits. Then, the hash functions associated with the subsam¬ 
pling matrices in ( |42l ) are T(c(k) = M/^k = kc, which sifts out the 6-bit segment kc independently out of n bits 
from the index k in group c. We call the output of the hash function in each group the bit segmentation. Clearly, 
these bit segmentations can be chosen differently according to the choice of subsampling matrices {Mc}cg[c 7 ]. For 
example, the bit segmentations in the first 3 groups are 


MM 


k[b + 1 ] 


'k[2b + l] 


) J 2 ~ 


) is — 


_k[b]_ 


A: [26] 


A: [36] 


®The reason for S — 1/3 to be the separation point between the very sparse regime and the less sparse regime will become clear in 
Proposition|^in the following section, where C > 3 is proven necessary for successful decoding with high probability. With the requirement 
C > 3 and the constraint Cb < n due to the choice of Me, we have b = 5n and therefore 5 < 1/3. 
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Since each element k of the support set K. is chosen independently and uniformly at random from F 2 by As¬ 
sumption!^ each bit segmentation = ?fc(k) is independently and uniformly chosen from {0,1 } for each ball. 
Therefore, each left ball is thrown independently into the bins on the right, which suggests that the edges from 
each left node to each right node are connected independently. Further, the bin index in each group contains bit 
segments in k that are uniformly distributed, and hence each ball is thrown uniformly at random to one of the B 
right nodes in that group. 

In the following, we show that if the redundancy parameter rj = B/K is chosen appropriately for the graph 
ensemble Q{K, rj, C, {Mc}cg[c]) with C subsampling groups and Me chosen as ( [42l ), then given the oracle, all the 
edges of the graph can be peeled off in 0{K) peeling iterations with high probability. 

Proposition 4 (Oracle-based Peeling Decoder Performance for 0 < 6 < 1/3). If we use C = 3 groups with 
the set size B = 0.4073Ar, where the subsampling matrices Me far each group are chosen as in ( |42| ), the induced 
graph ensemble Q{K, r], C, {Mc}cg[c]) guarantees that the oracle-based peeling decoder peels off all the edges 
in 0{K) iterations with probability at least 1 — 0{1/K). 

Proof The proof is given in the following subsections. □ 


B.1,1 Density Evolution 

Density evolution, a powerful tool in modem 
coding theory, tracks the average density of re¬ 
maining edges that are not decoded after a fixed 
number of peeling iteration i > 0. We introduce 
the concept of directed neighborhood of a cer¬ 
tain edge in the bipartite graph up to depth I = 

2i. This concept is important in the density evo¬ 
lution analysis since the peeling of an edge in 
the f-th iteration depends solely on the removal 
of the edges from this neighborhood in the pre¬ 
vious i — 1 iterations. The directed neighbor¬ 
hood A^e at depth £ of a certain edge e = (n, c) is 
defined as the induced sub-graph containing all 
the edges and nodes on paths ei, • • • , starting 
at a variable node v (left node) such that ei 7 ^ e. 

An example of a directed neighborhood of depth 
£ = 2 is given in Fig. 

To analyze the performance of the peeling decoder over the bipartite graph, we need to understand the edge 
degree distributions on the left and right of the bipartite graph. Since the left edge degree distribution is already 
known due to the regularity of the graph ensemble induced by subsampling, next we study the right edge degree 
distribution. 





Figure 9; Directed neighborhood of depth 2 of an edge e = {v, c). The 
dashed lines correspond to nodes/edges removed at the end of iteration 
i. The edge between v and c can be potentially removed at iteration 
f -I-1 as one of the check nodes d is a singleton (it has no more variable 
nodes remaining at the end of iteration i). 


Lemma 1. Let pj be the fraction of edges in the bipartite graph connecting to right nodes with degree j. In the 
very sparse regime 0 < <5 < 1/3, if we use C subsampling groups with subsampling matrices {Mdegp] chosen 


as (^ 1 , the edge degree sequence pj of the graph ensemble G{K, p, C, {Mc}ce[c]) is obtained as 


Pj 


(/-I)! 


(44) 


Proof See Appendix |B.4| □ 

Now let us consider the local neighborhood of an arbitrary edge e = (u, c) with a left regular degree d and 
right degree distribution given by {pj}f^i- If the sub-graph corresponding to the neighborhood A//^* of the edge 
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e = (f, c) is a tree or namely cycle-free, then the peeling procedures over different bins in the first i iterations are 
independent, which can greatly simplify our analysis. Density evolution analysis is based on the assumption that 
this neighborhood is cycle-free (tree-like), and we will prove later (in the next subsection) that all graphs in the 
regular ensemble behave like a tree when N and K are large and hence the actual density evolution concentrates 
well around the density evolution result. 

Let Pi be the probability of this edge being present in the bipartite graph after i > 0 peeling iterations. If 
the neighborhood is a tree as in Fig. 1^ the probability pi can be written with respect to the probability pi_i at 

the previous depth in a recursive manner Pi = — fori = 1,2,3,- ••. The term 

Pj{l — pi-iy~^ can be approximated using the right degree generating polynomial 

p{x) := ''^pjX^~^ = , (45) 

j 

where we have used ( |44l ) to derive the second expression. Therefore, the density evolution equation for our peeling 
decoder can be obtained as 


Pi= \i = 1,2,3,- ■■ (46) 

Clearly, the probability pi can be made arbitrarily small for a sufficiently large but finite f > 0 as long as C and rj 
are chosen properly. One can find the minimum value p for a given C to guarantee pi < Pi-i, which is shown in 
Table [T] Due to lack of space we only show up to C = 6. 


c 

2 

3 

4 

5 

6 

p 

1.0000 

0.4073 

0.3237 

0.2850 

0.2616 

Cp 

2.0000 

1.2219 

1.2948 

1.4250 

1.5696 


Table 1 ; Minimum value for p given the number of groups C 

Lemma 2 (Density evolution 0 < 6 < 1/3). Let Q{K,p,C,{lsJlc}c^\c]) graph ensemble induced by 

subsampling with C subsampling groups using subsampling matrices {]V[c}cg[c] ( |42| ) in the very sparse regime 

0 < (5 < 1/3, where the number of groups C and the redundancy parameter p chosen from Table^ Denote by % 
the event where the local 2i-neighorhood of every edge in the graph is tree-like and let Zi be the total number 

of edges that are not decoded after i (an arbitrarily large but fixed) peeling iterations. For any £ > 0, there exists 
a finite number of iteration i > 0 such that 


E[Zi\7l] = KCe/A, 


(47) 


where the expectation is taken with respect to the random graph ensemble Q{K, p, C, {Mc}cg[c])- 

Proof See Appendix |B. 5 1 □ 

Based on this lemma, we can see that if the bipartite graph has a local neighborhood that is tree-like up to depth 
2i for every edge, the peeling decoder on average peels off all but an arbitrarily small fraction of the edges. 

B.1.2 Convergence to Density Evolution 

Given the mean performance analysis (in terms of the number of undecoded edges) over cycle-free graphs, now 
we provide a concentration analysis on the number of the undecoded edges Zi for any graph from the ensemble 
Q{K, p, C, {Mc}cg[c]) the f-th iteration, by showing that Zi converges to the density evolution. 


27 











Lemma 3 (Convergence to density evolution for 0 < 5 < 1/3). Over the probability space of all graphs from 
G{K, r], C, {Mc}cg[c])> lo Pi be as given in the density evolution ( |46[ ). Given any e > 0 and a sufficiently large 
K, there exists a constant c > 0 such that 


(i) E[Zi] < KCe/2 (48) 

(ii) Fi{\Zi-E[Zi]\> KCe/2) <2eiip(^-ce‘^K^y (49) 


Proof We provide a concentration analysis in Appendix |B.6| on the number of the remaining edges for an arbitrary 
graph from the ensemble by showing that Zi converges to the mean analysis result. Here is a sketch of the proof: 


Mean analysis on general graphs from ensembles: first, we use a counting argument similar to |231 to show 
that any random graph from the ensemble G{K, t], C, {]V[c}cg[c]) behaves like a tree with high probability. 
Therefore, the expected number of remaining edges can be made arbitrarily close to the mean analysis 
\E[Zi] — E[Zi|7i]| < KCeffi such that E[Zi] < KCe/2 if N and K are greater than some constants. 


Concentration to mean by large deviation analysis: we use a Doob martingale argument as in |22| to show 
that the actual number of remaining edges Zi well concentrates around its mean E[Zi] with an exponential 
tail in K such that Pr {\Zi — E[Zi]| > KCef2) < 2 exp (—c^e'^K 4»+i ) for some constant C 4 > 0. 


□ 


B.1.3 Complete Decoding through Graph Expanders 

From previous analyses, it has already been established that with high probability, our peeling decoder terminates 
with an arbitrarily small fraction of edges undecoded 


Zi < KCe, Ve > 0, (50) 

where d is the left degree. In this section, we show that the all the undecoded edges can be completely de¬ 
coded if the sub-graph consisting of the remaining undecoded edges is a “good-expander”. Since there are many 
notions of “graph expanders”, we introduce the concept of graph expander with respect to the graph ensemble 
G{K, T], C, {Mc}ce[c]) iri this paper, which is induced by subsampling. 

Definition 5 (Graph Expander). A C-regular graph with K left nodes and C subsampling groups of B = pK 
right nodes is called a (e, 1/2, C)-expander if for all subsets S of left nodes with |5| < eK, there exists a right 
neighborhood in some group c, denoted by J\fc{S), that satisfies |A/’c(5)| > \S\/2for some c G [C]. 

Lemma 4 (Graph expansion property for 0 < (i < 1 /3). In the very sparse regime 0 < 6 < 1/3, if we 
use C > 3 groups with subsampling matrices {Mc}cg[c] chosen as then any graph from the ensemble 
G{K, p, C, {Mc}cg[c]) Cl (e, 1/2, C)-expander with probability at least 1 — 0{1/ K) for some sufficiently small 
but constant e > 0. 

Proof See Appendix |B. 7 1 □ 

Without loss of generality, let the Zi undecoded edges be connected to a set of left nodes S. Since each left 
node has degree C, it is obvious from ( [SO] ) that |5| < Ke with high probability. Note that our peeling decoder 
fails to decode the set S of left nodes if and only if there are no more single-ton right nodes in the neighborhood of 
S. A sufficient condition for all the right nodes in at least one group A//(5) to have at least one single-ton is that 
the corresponding average degree is less than 2, which implies that |5|/|A/’c(5)| < 2 and hence |A/’c(5)| > |5|/2. 
Since we have shown in Lemmaj^that any graph from the regular ensemble ^ (iT, p, C, {Mc}cg[c]) ^ 1/2; C)- 

expander with high probability such that there is at least one group |A/’c(5)| > |5|/2 for some c, there will be 
sufficient single-tons to peel off all the remaining edges. 
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B.2 Design and Analysis of a Specific Less Sparse Regime 5 = 1 — 1/C 


From now on, we address the design and analysis for the less sparse regime 1/3 < 5 < 1. For convenience, we start 
by discussing the case 5 = 1 — 1/C where C is the number of subsampling groups. Then, we generalize our design 
in Section B.3 to tackle arbitrary sparsities 6 G (1/3,1) using the basic constructions for sparsity 6 = 1 — 1/C. 
We let t = n/C such that B = 2^ with b = {C — l)t and B = rjK for some rj > 0. The subsampling matrices are 
chosen differently by 


M, 


^{c—l)tx{C—c)t 

^tx{C—c)t 

S^{C-c)tx{C-c)t 


0(c-l)tx(c-l)t 

Otx(c-l)t 

'^{C—c)tx 


c £ [C], 


which freezes a f-bit segment of the time domain indices m G F 2 to all zeros. 


(51) 


B.2.1 Random Graph Ensemble in the Less Sparse Regime 6 = 1 — 1/C 

For convenience, we divide k = [kj , • • • , ^ pieces of n/C-bit segments with 

kc = [k[cn/C], • • • , k[{c — l)n/C + 1]]'^. (52) 

Then in this regime, the hash functions associated with ( [5T] ) are defined as 

?f,(k) = Mjk=[fcf,... CG[C], (53) 


which produces a bit segmentation that sifts out all but one segment kc cyclically. Using this set of subsampling 
matrices (i.e. hash functions), the graph ensemble G{K, rj, C, {Mdcgp]) iri Definitionis also consistent with 
the “balls-and-bins” model. For example, when (7 = 3 and 6 = 2/3 such that t = n/3, the subsampling matrices 
are chosen as 


Ml = 

0 ^ V ^ 

3 3 

In ^ n 

3^3 

0 ^ V ^ 

3 3 

0 n n 

3^3 

, M2 = 

I n Y n 

3^3 

0 n n 

3^3 

0 ^ V ^ 

3 3 

0 ^ V ^ 

3^3 

CO 

II 

I n w n 

3^3 

0 ^ V ^ 

3^3 

0 V ^ 

3 3 

I n y n 
3^3 


0 n V 

3^3 

I n y n 
3^3. 


0 n V 
3^3 

In ^ n 

3^3. 


0 n V ^ 
3^3 

0 V ^ 

3 ^ 3 _ 


and the bin indices corresponding to the ball k in the 3 groups are given by 


k[n/3 + 1] 


k[l] 


- k[l] - 

) 32 — 

k[n/3] 

) 33 — 

k[n] 

k[2n/3 + 1] 

k[n] 

fc[2n/3] 


(55) 


Same as the very sparse case, since each bit segmentation jc = T(c(k) is independently and uniformly at random 
from F 2 by Assumption [T| the bit patterns k[i] for i G [n] are independently and uniformly chosen from {0,1} 
for each ball. Therefore, each left ball is thrown independently into the bins on the right, which suggests that the 
edges from each left node to each right node are connected independently. Further, the bin index in each group jc 
contains bit segments in k that are uniformly distributed, and hence each ball is thrown uniformly at random to one 
of the B right nodes in that group. Therefore, due to the independence and uniformity of the support distribution 
k, the graph ensemble is consistent with the random “balls-and-bins” model in each group. 
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B.2.2 Peeling Decoder over the Graph Ensemble in the Less Sparse Regime <5 = 1 — 1/C 


The analysis of the peeling decoder in the less sparse regime depends on the graphs induced by subsampling. Note 
that the key difference of the graphs associated with the less sparse case from the very sparse case is that for each 
ball k, although the edges are connected uniformly and independently to B bins in each group, they are no longer 
connected independently across different groups. However, since the graph ensemble is consistent with the “balls- 
and-bins” model in each group, it can be easily shown that the density evolution analysis and concentration analysis 
carry over to the less sparse regime based on the analysis in Section B.l However, there are some key differences 
in the graph expansion properties due to the lack of independence across different groups. In this section, we focus 
on proving the graph expansion properties for the graph ensemble in the less sparse regime. 


Lemma 5 (Graph expansion property for 5 = 1 — 1/ C). In the less sparse regime 5 = 1 — 1/C, if we use C > 3 
groups with subsampling matrices {Mc}cg[( 7 ] chosen as ( |51| ) and B = rjK chosen with respect to the number of 
groups C according to Table^ then any graph from the ensemble Q{K, rj, C, {]VIc}cg[c]) ci (e, 1/2, C)-expander 

with probability at least 1-0 ^ j^( 2 g- 2 C)/(c-i) ) some sufficiently small but constant e > 0 . 

Proof To show that the graph ensemble in the less sparse regime is a (e, 1/2, C) expander defined in Definition 
1^ we need to show that irrespective of the inter-dependence of the edges across different groups, any subset S of 
left nodes has at least one right neighborhood in one group such that l■^c(‘5)| > |5|/2. Since it has been 

shown in the very sparse regime in Lemma that the bottleneck event of graph expander is when the size of the 
set is constant |5| = 0(1). Therefore in the following, we show that for any given subset S of left nodes with size 
|5| = s = 0(1), their right neighborhoods will not be multi-tons with high probability. 

Given an arbitrary left node with the following bit segments 


k= [A:[n],--- ,A:[1]]^ = kc ■■= [k[ct],-■ ■ ,k[{c - l)t + l]f, c€[C], (56) 


its right neighbors are all multi-tons if and only if there exists at least another left node labeled k' in each group 
c G [O] such that ?fc(k) = 'Hc/k')- For a pathological set S where ?fc(k) = ?fc(k') for any distinct pair 
k 7 ^ k' G 5, the left node labels k and k' differ with each other only in one segment: 

fee* / k /^, for some c* G [O] (57) 

kc = k/, for all c / c*. (58) 


Since there are at least 2 such nodes for each group c G [C] to form multi-tons, the size of the pathological set 
|5| = s is satisfies s > 2^. Let us consider the augmented worst case scenario where there are 5 / 2 *^ ^ left nodes 
satisfying the pathological set requirements in ( [ST] ) in one group (assuming there are only 2 such nodes in other 
C — 1 groups). For all the nodes k G 5, the total possible number of left nodes that can differ in one segment 
kc for some c G [C] is 2*, and therefore the probability of having s/2^~^ nodes from that space is 2 ^ 31 / 2 ^ In 
order for an arbitrary set of s/2^~^ left nodes to land in the same bin on the right in all C subsampling groups, the 
probability can be obtained as 



Let F = 2*, then the probability of this event can be obtained readily for any size s as 
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Using the inequality (^) < (ae/ 6 )^, we have 

Sinee the pathological set satisfies s > 2^ and K = 0{F^-^), we can further bound the probability as 

1 \ 1 


Pr (5) = O 


72C-2C 


= o 


X{2C-2C)liC-l) 


( 62 ) 

(63) 

□ 


B.3 Generalized Design to Arbitrary Sparsity Regime 0 < d < 1 


As of now, we have presented the subsampling design for the very sparse regime 0 < 5 < 1/3 and partly for the 
less sparse regime 6 = 1 — 1/C for (5 = 2/3,3/4,4/5, 5/6, •• • for all C > 0. However, it does not generalize 
to any sparsity 0 < (5 < 1. In this section, we continue to show that using the basic constructions above, we can 
achieve any sparsity regime. The main idea of extending our subsampling design to an arbitrary sparsity is by the 
following: 


• Hash with Common Prefix: for example, we want to design the subsampling pattern for sparsity <5 = (1 + 
a)/(3 + a) for some a > 0. Clearly, by varying a G (0, oo), one can obtain an arbitrary sparsity <5 G (1/3,1). 
However, we hereby note that this construction is not universal since beyond some a*, the sparse bipartite 
graph constructed by this design fails to work with high probability. We will show later how to determine 
such threshold a* and how to achieve sparsity beyond that point. In the following, we will proceed with this 
example. 

rri rri rri 'T'~i 

We divide the bin index k into 4 segments k = [ki, , k:i] , where ki, k 2 and k^ are of equal length 

containing be = n/{3 + a) bits for c = 1,2,3, while k^ contains 64 = an/(3 + a) bits. The hash function 
in each group is then designed with the following bit segmentation: 


ndk) 


kc 

ki ’ 


c= 1,2,3. 


(64) 


In this way, the output of the hash has b bits with 

, , , 

b — Oc T 64 — 


+ 


an 


1 T u 


3-t-(r 3-t-u 3-t-u 


n = 5n, 


(65) 


and hence we have B = 2^ = rjK = 0{N^) for some appropriately chosen rj. We refer to this generalized 
hash design as common-prefix since the hash outputs start with the same segment k^. 


• Union of Disjoint Sparse Bipartite Graphs: using the generalized hash designed above, the sparse bipartite 
graph is still consistent with the balls-and-bins model, where there are H right nodes and K left 

nodes. Furthermore, since the right node of the graph is indexed by two segments (^ 4 , kc), the resulting 
bipartite graph can be viewed as 2 ^‘‘ disjoint unions of sparse bipartite graphs with left nodes and 

= 2^= right nodes. In other words, we have 2^‘‘ disjoint unions of graphs from the random graph 
ensemble ^(iT/2^"‘, 0.4073, 3, {Mc}c=i, 2 , 3 )? the decoding of which fails with probability 0(l/Ar/2^"‘) = 
0(1/2^'=). Therefore, by a union bound, the failure probability of peeling decoding over the bipartite graphs 
given by this design is 


O 



X 2^4 ^ O 



O 


1 —a 

23+5 


= O 




= o 




( 66 ) 


Clearly, it is required that a < a* = 1 such that the failure probability approaches zero asymptotically in K. 
This implies a sparsity regime 5 = (1 + a)/(3 + a) < (1 + a*)/(3 + a*) = 1/2. Therefore, this example 
only works for sparsity 1/3 < 5 < 1/2. In the following, we provide specific constructions that cover the 
entire sparsity regime 0 < (5 < 1 . 
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B.3.1 Achieving Intermediate Sparsity 0 < <5 < 1/3 

The design in Section[B?T]can be used directly and hence we omit the discussions here. 


B.3.2 Achieving Intermediate Sparsity l/2> < 5 < 0.73 

Here we target sparsity 6 = (2 + a )/(6 + a), which starts from <5 = 1/3 with a = 0 and ends at <5 = 0.73 with 
a = 8.81. To achieve such sparsity, we divide the bin index k into 7 segments 

, (67) 


where ki, k 2 , k^, fc 4 , k^ and fcg are of equal length containing be = n /(6 + a) bits for c = 1 , 2 , • • • , 6 , while k-j 
contains 67 = an /(6 + a) bits. Therefore, we have C = 6 groups for subsampling, and the hash function in each 
group is designed with the following bit segmentation: 



k2 


'k{ 


ki 

77i(k) = 

ks 

, 772(k) = 


, 773(k) = 

k2 


.^ 7 . 


.^ 7 . 


.^ 7 . 






k^ 

774 (k) = 

kg 

, 775(k) = 

ke 

, 776(k) = 

ky, 


^7_ 


.^ 7 . 


.^ 7 . 


( 68 ) 

(69) 


2 n 


an 


2 + a 


n = 6 n. 


In this way, the output of the hash has b bits with 

b — 2bc ~\~ by = - -\- 

6 + 0 6 + a 6 + a 

According to Table [T] we need to choose B = 0.261677. Using the same analysis outlined before, we can show 
that the peeling decoder works with probability at least 1 — 0(1/77). 


(70) 


B.3.3 Achieving Intermediate Sparsity 0.73 < <5 < 7/8 

Here we target sparsity 5 = (3 + a )/(8 + a), which starts from 6 = 0.73 with a = 10.52 and ends at <5 = 7/8 with 
a = 32. To achieve such sparsity, we divide the bin index k into 9 segments 

k = [kj, kl, kj, kj, kj, kj, , k'^, k'^f, (71) 


where ki, k 2 , fcs, k^, k^, k^, ky and fcg are of equal length containing be = n /(8 + a) bits for c = 1 , 2 , • • • , 8 , 
while kg contains 69 = an /(8 + a) bits. Therefore, we have 0 = 8 groups for subsampling, and the hash function 
in each group is designed with the following bit segmentation: 



'k2 


'k{ 


■fcr 


'k{ 

77i(k) = 

k3 

k^ 

, 772(k) = 

kg 

ki 

, 773(k) = 

k2 

ki 

, 774 (k) = 

k2 

kg 


kg_ 


_kg_ 


_kg_ 


_kg_ 


kg 


kg 


kg 


kg 

775(k) = 

ky 

kg 

, 776(k) = 

ky 

kg 

, 777(k) = 

kg 

kg 

, 778(k) = 

kg 

ky 


_kg_ 


_kg_ 


_kg_ 


_kg_ 


(72) 


(73) 


3n 


an 


3 + a 


n = 6 n. 


In this way, the output of the hash has b bits with 

b — 3bc bg = - -\- 

8 + 0 8 + 0 8 + a 

According to Table [T] we need to choose B = 0.233677. Using the same analysis outlined before, we can show 
that the peeling decoder works with probability at least 1 — 0 (1/ 


(74) 
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B.3.4 Achieving Intermediate Sparsity 7/8 < < 1 

The sparsity index 6 in the range 0.875 < 5 < 1 can be achieved by the combination of designs proposed in the 
less sparse regime for increasing (but constant) number of groups C as dictated by 5 = 1 — 1 /C. For example, we 
can target the sparsity setting <5 = (7 + o)/(8 + a) starting from 6 = 0.875 with a = 0 and until <5 = 0.99. In this 
construction, we divide the bin index k into 9 segments 

k = [fcf, k'^, k'^, kj, kj, kj, , k^, k'^f, (75) 

where ki, k 2 , fcs, k^, k^, k^, k-j and fcg are of equal length containing be = n/(8 + a) bits for c = 1, 2, • • • ,8, 
while kg contains bg = an/{8 + a) bits. The hash function in each group is then designed with the following bit 
segmentation: 


ki 

k2 


Hc(k) 


kc—i ) 
kc+i 


c = ,) 


kg 


In this way, the output of the hash has b bits with 


b = 7bc + bg 


7n 

8 + a 


an 

8 + a 


7 + 0 

-n = on. 

8 + 0 


(76) 


(77) 


According to Table [T] we need to choose B = 0.233677. Using the same analysis outlined before, we can show 
that the peeling decoder works with probability at least 1 — 0(1/77). 


B.4 Right Edge Degree Distribution 

Clearly, the total number of edges is 770 in the bipartite graph since there are 77 left nodes in the bipartite graph 
and each left node has degree O. Therefore, since the expected number of right nodes with degree j can be obtained 
as Pr (a right node has degree j) CBj, the fraction pj can be obtained as 

Pr (a right node has degree j)05j . . j . j n 

Pj = -- -— = jpPv (a right node has degree , (78) 

77C 

where we have used B = pK and p is the redundancy parameter. According to the “balls-and-bins” model, the 
degree of a right node follows the binomial distribution B{l/pK, 77) and can be well approximated by a Poisson 
variable as 


Pr (a right node has degree j) 


{l/pye-^/'^ 


As a result, the fraction pj of edges connected to right nodes having degree j is obtained as (ig. 


(79) 


B.5 Proof of Mean Performance 

Let Z/ G {0,1} be the random variable denoting the presence of edge e after i iterations, thus 

KC 

Zi = Y,Zl. (80) 

e=\ 
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Since each edge is peeled off independently given the event %, the expected number of remaining edges over 
cycle-free graphs can be obtained as 


KC 

E [Zi\Ti\ = Y,nzt\Ti\ = KCp^, (81) 

e=l 

where by definition pi = Pr (Z| = l\7i) is the conditional probability of an edge in the i-th peeling iteration 
conditioned on the event Ti studied in the density evolution equation ( |4^ . We are interested in the evolution of 
such probability pi. In the following, we prove that for any given e > 0, there exists a finite number of iterations 
z > 0 such that Pi < ejA, which leads to our desired result in (|4^. 


B.6 Concentration Analysis 


B.6.1 Proof of Mean Analysis on General Graphs from Ensembles 


From (801, we have 


KC 

nzi] = Y.^[zt] = Kdnzt]- 

e=l 


(82) 


From basic probability laws, we have 

E [Zf] = E [Z!\%] Pr (71) + E [Z!\Tn Pr {rn • 

Recall from the density evolution analysis that E [Z^\Ti] = Pi, we have 

Pr(7;)<l, E[Ze|7;"]<l (83) 


and therefore the following holds: 


Ft-Pr(7I^)<E[Z|]<p, + Pr(7'‘=). 


(84) 


If the probability of a general graph not behaving like a tree can be made arbitrarily small for any e > 0, 




£ 

4’ 


(85) 


then we can obtain the result in ( |4^ by letting pi 
holds for sufficiently large K. 


e/4 in the density evolution analysis. Next, we show that (851 


Lemma 6. For any given constant e > 0 and iteration i > 0, there exists some absolute constant Kq > 0 such 
that 


Pr(r)<co 


log^iF 

K 


( 86 ) 


for some constant co > 0 as long as K > Kq. 

From this lemma, we can see that for an arbitrary e > 0, the result follows as long as 76 > Kq where Kq is 
the smallest constant that satisfies i6o/ log* 76o > 4co/e given e and i. In the following we give the proof of the 
lemma. 
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Proof. Let Ci be the number of check nodes and Vi be the number of variable nodes in the neighborhood 
Because the graph ensemble Q{K, rj, C, {Mc}cg[c]) follows Poisson distributions on the right, the results in |22] 
are not readily applied here. Therefore, the key idea is to prove that the size of the tree neighborhood is bounded 
by 0(log* K) with high probability, which is intuitively clear since Poisson distributions have very light tails due 
to the exponential decay. 

To show this, we unfold the neighborhood of an edge e up to level L, and at each level i we upper bound the 
probability that the size of the tree grows larger than 0(log* K). Specifically, from the law of total probability, we 
upper bound the probability of not having a tree as follows for some ki > 0 


Pr (If) < Pr (Vi > log* K) + Pr {Q > m log* K) (87) 

+ Pr {7f\Vi < Ki log* K, Ci < Ki log* K) . (88) 

Denoting at = Pr (Vi > m log* , we bound the first term using the total law of probability as follows 

Oii < ai-i + Pr {Vi > Ki log* K\Vi-i < ki log*“^ K) . 

Given Vi-i < ki log*“^ K, we have Ci < K 2 log*~^ K at depth i for some rc 2 > 0 since the left degree of any 
graph from both ensembles is upper bounded by d and D respectively, which are both constants. Therefore, the 
second term in the above recursion can be bounded as 


Pr {Vi > Ki log' K\Vi-i < Ki log'-i K) < Pr {V > m log' K\Ci < K 2 log*-i K) . (89) 


Now let the number of check nodes at exactly depth i be Mi and let Di be the degrees of each of these check nodes, 
the right hand side can be evaluated as 


Pr {Vi > Ki log' K\Ci < K 2 log'-i K) < Pr 


■ Mi 


^Di> K 3 log* K 


V 2 = 1 


(90) 


for some > 0. Since the check node degrees are Poisson variables with rate l/rj and the number of check nodes 
at depth i is less than the total number of check nodes up to depth i such that Mj < Ci < ki log* K, then the 
probability can be upper bounded with Pr {Di > x) < {e\/xY as 


■ Mi 


Pr ^ A > «^3 log* K < 


vi=i 


eMi/t] 

Ks log* K 


K3 log* K 


< 


K4 \ 
log Kj 


K3 log* K 


< ^ 
- K 


(91) 


for some K 4 > 0 and K 5 > 0. Therefore we have 


Oii ^ Oi-l + 


K5 

K 


(92) 


and thus the number of variable nodes exposed until the i-th iteration can be bounded by log* K with high prob¬ 
ability Pr {Vi > Ki log* K) < O {^). Similar technique can be used to show that the tail bound for the check 
nodes is Pr {Ci > ki log* K) < O {^). 

It has been shown that the number of nodes is well bounded by 0(log* K), now we proceed to show the tree¬ 
like neighborhood of our graph ensemble by induction. Assuming that the neighborhood at the i-th iteration 
(z < z*) is tree-like, we prove that is tree-like with high probability. 

First of all, we examine the neighborhood Assume that t additional edges have been revealed at this 

level without forming a cycle. The probability that the next edge from a variable node does not create a cycle is the 
probability that it is connected to one of the check nodes that are not already included in the tree, which is lower 
bound by 1 — Ci^/{r]K). Therefore, given that is tree-like, the probability that is tree-like is lower 

bounded by 


1 - 


r]K 


C+i-Ci 


(93) 
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By similar reasoning, given that is tree-like, the probability that is tree-like is lower bounded by 


1 - 


K 


Vi+r-V, 


+ 


Therefore, the probability that is tree-like is lower bounded by 

Ci^ / TA \ Vi 

'qKj V* K J - * \K 'r]K 
Therefore the probability of not being tree-like is upper bounded by 

log^iT 


K 


Pr(r)<co- 


K 


for some absolute constant cq > 0 . 


(94) 


(95) 

□ 


B.6.2 Proof of Concentration to Mean by Large Deviation Analysis 

Now it remains to show the concentration of Zi around its mean E[Zi]. According to ( |80l ), the number of remaining 
edges is a sum of random variables Zi = while summands are not independent with each other. 

Therefore, to show the concentration, we use a standard martingale argument and Azuma’s inequality provided 
in | [ 22 ] | with some modifications to account for the irregular degrees of the right nodes. 

Suppose that we expose the whole set of i? = iTC edges of the graph one at a time. We let 


V^ = E 


7 I 




(9 


= ,KC. 


(96) 


By definition, Yo,Yi, ■ ■ ■ ,Ykc are a Doob’s martingale process, where Yq = E[Zi] and Ykc = To use 
Azuma’s inequality, it is required that |Tf+i — Yf\ < for some > 0. If the variable node has a regular 
degree dy and the check node has a regular degree dc, then | 22 | shows that A^ = 8{dvdcy with i being the 
number of peeling iterations. However, although we have a regular left degree dy = C in our graph ensemble 
Q {K, rj, C, {Mc}cg[c] the check node degree is not regular with degree dc and therefore requires further analysis. 


Proof of Finite Difference A^ 

To prove that the difference A^ is finite for check node degrees with Poisson distributions, we first prove that the 
degree of all the check nodes can be upper bounded by dc < 0 (iT®+T) with probabilit>^at least 

ciii'exp (^—C 2 K 4i+i ^ 

for some constants ci and C 2 . Let B be the event that at least one check node has more than O edges, 

then for some C 3 > 0 we have 


Pr {B) < c^K exp ( —C 2 K 4^+1 


(97) 


by applying a union bound on all the R = rjK check nodes of the graphs from G{K, r], C, {Mc}cg[c])- a result, 
under the complement event B'^, we have 


Ai = 0 . 


(98) 


^Let A be a Poisson variable with parameter A, then the following holds 

cK 4i + l 


7 ^ \ c±\ ***'-rj. 

Pr > cK ^— J < Cl exp C 2 ff 4 J+t^ 


for some ci and C 2 . 
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Large Deviation by Azuma’s Inequality 

For any given e > 0, the tail probability of the event Zi > KCe can be computed as 


Pr(|Z,-E[Z,]|>^) 


<Pri\Z,-E[Zi]\ > 


KCe 

2 



+ Pr {B) 


< 2 exp 

< 2 exp 





+ c^K exp ( —C2iF4i+i 


where C4 is some constant depending on C, rj and all the other constants ci, C2, C3. This concludes our proof for 


B.7 Proof of Expander Graphs 

Given an arbitrary subset of left nodes S of size |5| = s with less than s/2 neighbors for all C subsampling groups. 
The probability of this event can be obtained readily for any size v as 


Pr (5) < 



K\ fr]K 


c 


s J \s/2j \2r]K 


Cs 


(99) 

( 100 ) 


where we have used the fact that the number of check nodes is 


rjK. Using the inequality (^) < (ae/b)^, we have 


Pr (5) < 



~Jj2) [^) 



( 101 ) 


where c = e(e/r})'^^‘^ is some constant. Clearly, as long as C/2 — 1 > 1/2 such that C > 3, we can further bound 
the probability as 


Pr (5) < 



( 102 ) 


It can be seen that the probability of not forming an expander depends on the size of the remaining subset |5| = s. 
Now we examine two extremes with s = 0{K) and s = 0(1), and obtain the following: 


Pr (5) 



s = eK with e < 1/ (2c^) 
s = 0(l). 


(103) 


Clearly, the bottleneck event is when the graph is left with s = 0(1) variable nodes, which happens also with 
probability approaching zero asymptotically in K. Therefore, the random graphs from the ensemble are good 
expanders with probability at least 1 — 0(l/iT). 
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C Proof of Proposition 

Given a single-ton bin with an index-value pair (k, X[k]), 

Up = + Wp, pe [P], (104) 

it is clear that sgn [Up] = (dp, k) © sgn [X[k]] © 1 whenever the noise Wp is sufficiently large such that it crosses 
over X [k] (— 1 ) Clearly, this is a random event and we can model it with some Bernoulli variable Zp G {0,1} 
with some probability pz 


sgn [Up] = (dp, k) © sgn [X[k]] © Zp. (105) 

The exact parameter pz of the Bernoulli random variable Zp can be found by studying the tail events that trigger 
the flipping, but here for simplicity we directly upper bound it as follows 

X[k]p 

PZ < Pe := Pr^{\Wp\ > |X[k]|) < e (106) 

D Proof of Theorem Peeling Decoder using a Robust Bin Detector 

Let E'bin be the event where the robust bin detector makes a mistake in the 0{K) peeling iterations. If the error 
probability of the robust bin detector described in Section [^satisfies 

Pr(Ebin) = , (107) 


then the result directly follows from the Bayes rule: 

Pi? = Pr ^supp 7 ^ supp (X) \eQ Pr {EU + Pr (supp (x) 7 ^ supp (X) Pr {E^n) 

< Pr (supp (x) 7 ^ supp (X) + Pr (L;bin) = O (l/K ), 

where the first term in the last inequality is obtained from Theorem for the peeling decoder with an oracle 
such that the event Eq holds. Therefore, it remains to show that ( |107| ) holds. The main idea is to analyze the error 
probability of making at least an error on any bin observation, followed by a union bound on all the bin observation. 
Let the error event in any bin j as Ej, then we have the following union bound across rjK bin observation vectors 
as well as CK iteration{3 

TjK 

Pr{Eun)<CK\jFr{Ej), (108) 

1=1 

where C is the left degree of the regular ensemble Q{K, p, C, {Mc}c 6 [c]) ■ Without loss of generality, we drop the 
bin index j and use a union bound over all bins such that 

Pr(^bin) <i?C762pr(.E), (109) 

where Pr (E) is the error probability for an arbitrary bin. It can be seen that due to the union bounds, it is required 
that Pr (E) < 0(1/K^) such that Pr (Eun) < 0(1/K). 

In the following, we prove that Pr (E) < 0{1/K^) holds using the generic model in Proposition]^ Since there 
are different types of errors, thus in the following analysis a in ( [T8| ) is fixed as a zero-ton, single-ton or multi-ton 
respectively for each class of errors. 

*The number of iterations is taken to be the worst case where at each iteration only one edge is peeled off. 
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Definition 6. The error probability Pr {E)for an arbitrary bin can be upper bounded as 

Vr{E)< Y. Pr(-^^'Hs(k,X[k]))+ Pr (^^(k, X[k]) ^ (110) 

+ Pr(?^ 5 (k,X[k]) ^?^ 5 (k,X[k])) (111) 

where T is either a zero-ton TLz or a multi-ton TLm ond 

1. Pr [E ^ ?(s(k, X[k])) is called the missed verification rate in which the single-ton verification fails when 
the ground truth is in fact a single-ton % = ?(s(k, X[k])/or some k G F 2 and -^[k]. 

2. Pr (^(^(k, X[k]) E) is called the false verification rate in which the single-ton verification is passed 
for some single-ton TL = ^(^(k, X[k]) with an index-value pair (k, X[k]) when the ground truth is E G 
{TLz, TLm}- 

3. Pr ^'^(^(k, X[k]) ^(^(k, X[k])^ is called the crossed verification rate in which a single-ton with a wrong 

index-value pair k 7 ^ k, X [k] X [k] passes the single-ton verification when the ground truth is a single-ton 
with an index-value pair TL = TLsO^, X[\<l\) for some k k. 

The false verification rate, missed verification rate and crossed verification rate for the near-linear time and 
sub-linear time recovery schemes are given in the following propositions. 

Proposition 5 (False Verification Rate). For any 0 < 7 <SNR/ 2 , the false verification rate for each bin hypothesis 
can be upper bounded as follows: 


Pr H5(k,V[k])^'Hz <e 


,-a(vT+27-l)' 


.Ti ^ 


-f 1- 


2-yv^ 


Pi 


□ 


Pr [ns{KX[k])^nM) < e 4 1+"^^ + Ne 

where Pi is the number of the random offsets in the NSO-SPRIGHT and the SO-SPRIGHT algorithm. 

Proof. See Appendix]^ 

Proposition 6 (Missed Verification Rate). For any 0 < 7 < SNR/2, the missed verification rate for each bin 
hypothesis can be upper bounded as follows: 

Pi 

Pr(Hz^Hs(k,X[k])) < e " 


{i-2ey 


Pr {Pm ^ PsiK X[k])) < 2e"^^i + 


2ne 

(/3/Pe-l)^ 

2e 3 


Pi 


(l-2Pe)'^ 


P2 


NSO-SPRIGHT 

SO-SPRIGHT. 


+ 2e 

where Pi is the number of random offsets in the NSO-SPRIGHT and the SO-SPRIGHT algorithm, while P 2 and 
P3 are the numbers of the zero offsets and coded offsets in the SO-SPRIGHT algorithm. 

Proof. See Appendix]^ □ 

Proposition 7 (Crossed Verification Rate). For any 0 < 7 < SNR/2, the false verification rate for each bin 
hypothesis can be upper bounded as follows: 


Pi -i-i 


1- 


Pi 


Pr (^7(s(k,X[k]) ^?f5(k,X[k])j < e 4 1+47 -|- 2Ne 

where Pi is the number of random offsets in the NSO-SPRIGHT and the SO-SPRIGHT algorithm. 

Proof. See Appendix [G| 

Since all the error probabilities decay exponentially with respect to {Piff^^, it is now clear that if Pi is chosen 
as Pj = 0{n) = 0(log A”), the probability can be bounded as Pr (P) = 0(1/A^) such that Pr (Pbin) = 0(1/^)- 


□ 
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E Proof of False Verification Rates in Proposition]^ 

The false verification events occur when the ground truth is not a single-ton, and therefore, the probabilities can be 
obtained using the bin observation model 


U = Sa + W 


( 112 ) 


with a being a zero-ton a = 0 or a multi-ton |supp (q:)| > 1. With a slight abuse of notation, here S G 

is the codebook associated with the Pi fully random offsets in the NSO-SPRIGHT and SO-SPRIGHT algorithm. 

E.l Detecting a Zero-ton as a Single-ton 

By definition, the probability of detecting a zero-ton as a single-ton can be upper bounded by the probability of a 
zero-ton failing the zero-ton verification: 


Pr (Ps(k,X[k]) ^Pz) <Pr { > {1 + 


Since W ~ ^(0, we can bound this probability using Lemma 


11 


Pr \\wf > (1 + 


E.2 Detecting a Multi-ton as a Single-ton 

By definition, the error probability can be evaluated under the multi-ton model when it passes the single-ton 
verification step for some index-value pair (k, X [k]) 


Pr Ps(k,X[k])^PM =Pr 


1 

A 


17-X[k]sr <(l + 7)z^' 


given some multi-ton observation U = Sa -I- W. Letting g = S(a — X[k]eg) and v = W, we compute this 
probability according to the total probability law as follows 


^ \\9 + vf < + 


= Pr f ^ llflr -b vf < (1 -b 


> 27 z ^2 ) X Pr 

Pi / 


1|L > 2 -,.^ 


(113) 


+ Pr yy \\g + < (1 + 

< Pr (y\\9 + vf < (1 + 7)^^^ ^ 


^ < 2W 1 X Pr 

Pi - ^ ) 

> ) -b Pr ( 


f < 2-, A 


where the first term is basically the single-ton verification error rate when the multi-ton has sufficiently large energy 
while the second term is the probability of any multi-ton not having sufficiently large energy. In the following, we 
bound these two probabilities separately with exponential tails. 

We start from the single-ton verification error rate when the multi-ton has sufficiently large energy, or namely 


> 2713^ J. Lemma 


11 


Pr(^||g + r;f < (1 + 7)^^" ^ 

Note that the first term is conditioned on the event where 


can be directly used here by letting T 2 = {1 + 
Ifl'll^/Pi > 271/^, therefore the minimum normalized 
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non-centrality parameter can be obtained as Vrain = min^ ||gr|| jP\v^ = 27. Clearly, the condition for the thresh¬ 
old in ( |155| ) holds for Corollary [T] and thus the first term can be bounded accordingly as 

(114) 


Pr f ^ IlSf + 1^11^ < + 


m 


Pi 




Now we examine the probability of a multi-ton not having sufficiently large energy, or namely Pr ^ ^ \\g\\^ < 
Letting (3 = cx. — X[k]ej^, we have g = S/3 and thus 

^ ( l|S/3f 


Pr 


-|^< 27 i. 2 j =Pr 




P^ 




< 27zy^ 


(115) 


Denoting the support of £ := supp (/3), we bound this probability with respect to the following two multi-ton 
scenarios: 

• |£| = L = 0(1) where the multi-ton size is a constant. Note that ||S/3|p = where S/; is the 

sub-matrix consisting of the columns k G £ and /3£ is the sub-vector containing the elements in the set 
k G £. Then, we have 

Amin (S^S,:) ||/3^f < < Amax (S^S,:) . (116) 

Using 11/3^ f > Lp^, the probability can be bounded as 


Pr 


< 2^u^ j = Pr 




< Pr ( Amin ( ySiSc ) < 


= Pr ( Amin ( ySiSc ) < 


272^2 

271/2 


Lemma 7. Denote the mutual coherence of the codebook S by p := ^ Then for 


(117) 

(118) 

(119) 

some 


^2 

given po > 0, we have Pr (p > pq) < 2Ne~^^^. 

Proof Since S contains i.i.d. Rademacher entries, the result follows by a simple Hoeffding bound. 
According to the Gershgorin Circle Theorem 


we have the following bound 


1 

Pi 


Amin ( ) >l-Lp 


27Z/2 


Pr ( Amin ( ^S^Sz: ) < ^ ) < Pr ( 1 - £/r < 


271/2 


£p2 


= Pr ( ^ > - ( 1 - 


27Z/2 

£p2 


□ 

( 120 ) 

( 121 ) 

( 122 ) 


By letting pQ = ^ we can upper bound this probability using Lemma as 


Pr 


<Pr7>j 1 - 


271/2 


< 2Ne 


1- 


271 /'^ 


2L^ I Lp^ 


Pi 


(123) 


which holds if 7 < Lp^I2v^. 
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• \C\ = L = uj{l) where the multi-ton size is not a constant and grows asymptotically with respect to K. As a 
result, the vector of random variables g = S^/3^ becomes asymptotically Gaussian due to the central limit 
theorem with zero mean and a covariance 


E [gg'^] = E [^cf^cPc^l] = Lp^l. 


Therefore, from Lemma [TT] and Corollary [T] we have 


Pr 



< 



(124) 


which holds if 7 < Lp^ 

Finally, as long as 0 < 7 < p^l 2 iP‘, for any multi-ton there exists some constant e > 0 such that 


Pr 




< 2'^v^ < A"e 


-f 1- 




Pi 


F Proof of Missed Verification Rates in Proposition 

The missed verification events occur when the ground truth is a single-ton, and therefore, the probabilities is 
obtained using the bin observation model with some index-value pair (k, X [k]) 


U = X[k]sk + W. 


(125) 


With a slight abuse of notation, here S is the codebook associated with the fully random offsets in our designs. 


F.l Detecting a Single-ton as a Zero-ton 

By definition, the probability of detecting a single-ton as a zero-ton can be upper bounded by the probability of a 
single-ton passing the zero-ton verification: 

Pr CHz P- PsiK X[k])) < Pr || A[k]sk + < (1 + 7)^^') • 

Since W ~ V(0, z^^I), we can bound this probability using Lemma [TT]by letting g = X[k]sk and v = W: 

/ T \ Pi 

Fr(-\\X[k]s^ + Wf<{l + jy]<e " , 

which holds as long as 7 < p‘^/u‘^. 


F.2 Detecting a Single-ton as a Multi-ton 

By definition, the error probability can be evaluated under the single-ton model when it fails the single-ton verifi¬ 
cation step for some index-value pair (k, X [k]) 


Pr (?fM^2fs(k,X[k])) = Pr 



U - X[k]sj^ 


> (1 + 7) 
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given some single-ton observation U = X[k]sk -|- W. Since the estimated index-value pair (k, X[k]) may or may 
not be correct, the above probability can be bounded as: 


Pr 


1 

A 


U - X[k]sr > (1 + 


= p.iP 


U - X[k]sr > (1 + X[k] / X[k] or k / k Pr (X[k] / X[k] or k / k 


+ Pr| 


U — X[k]Sj' > (1 -b 7 )^^^ = -^[k] and k = k Pr (^[k] = X[k] and k = k 


< Pr ( X[k] / X[k] ork/k)-bPr(^ U - X[k]sj^ ^ > (1 + 7 ) 1 ^^ ^[k] = X[k] and k = k 


It is clear that 

Pr U - X[k]sj^ ^ > (1 + 7)J^^ ^[k] = X[k] and k = k^ (126) 

= Pr (^Y \\Wf > (1 + 7)z^^^ < (127) 

therefore we focus on bounding the first term Pr [k] / [k] or k / k^ From basic probability laws we have 

Pr (^X[k] /X[k]ork/k) 

< Pr (^l[k] / X[k]) + Pr (^k / k 


(128) 
(129) 

= Pr(X[k] ^X[k] k/k)Pr (k/k) +Pr (X[k] ^ X[k] k = k)Pr (k = k) +Pr(k/k) (130) 


<Pr(X[k] ^X[k] 


k = k +2Pr k/k . 


(131) 


The first term is the detection error probability of a BPSK signal with amplitudes ±p, and can be bounded as 


Pr X[k]/X[k] 


k = k) < 


(132) 


Since the second term Pr ^k ^ kj is essentially the error probability of the single-ton search, we prove the fol¬ 
lowing lemmas for different bin detection schemes. 

Lemma 8 (Single-ton Search Error Probability of the NSO-SPRIGHT Algorithm). The single-ton search 
error probability of the NSO-SPRIGHT algorithm is upper bounded as 


Pr (k 7 b k) < ne 


( 1 - 29 )^ 

8 


Pi 


where Pi is the number of random offsets in the NSO-SPRIGHT design. 
Proof See Appendix |H.1[ 


(133) 


□ 


Lemma 9 (Single-ton Search Error Probability of the SO-SPRIGHT Algorithm). The single-ton search error 
probability of the SO-SPRIGHT algorithm is upper bounded as 


ifl/Fe-iy 


Pr (^k 7 b kj < e" ''3 ' ^3 + e"' s'" (134) 

where Pi is the number of the coded offsets G and P 2 is the number of zero offsets in the SO-SPRIGHT design. 
Proof. See Appendix |H.2[ □ 


(l-2Pe)" 


P2 
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G Proof of Crossed Verification Rates in Proposition]^ 

A crossed verification implies that some wrong index-value pair (k, X[k]) passes the single-ton verification 
Pr (ns{KX[^)^ns{KX[^)') =Pr('^||i7-V[k]s^||" < (1 + 7)^^' 


^Pr(^ 


A:[k]sk-X[k]sc +<(1+7)1^' 


Letting gr = X[k]sk — X[k]sj', this can be re-written as 


Pr(?^s(k,X[k]) ^?^s(k,X[k])) =Pr( + < (1 + 7)^^' 


1 


Similar to (1131, we have 


^ IIS' + < (1 + < Pr f ^ IIS' + < (1 + 1 )^" 


Similar to (1141, the first term can be bounded as 


PTi — \\g + wr<ii+jy 


Pi 7 ^ 


Pi 


> 2717 < e . 


-71— > 271/^ ) -|- Pr ^ 


Pi 


< 271 /^ . 


(135) 


Finally, similar to ( |136| ) with L = 2, the second term Pr ( < 2'ijv^ ) can be bounded as 


Pr I — < 2'ijv^ I < 2A^e 


-I 1- 


Pi 


(136) 


H Proof of Single-ton Search Error Probability in Lemma [§ and 

H.l Single-ton Search in the NSO-SPRIGHT Algorithm 

From the MLE in ( |3T] ), the error probability of the single-ton search for the g-th bit of k is 

Pr (k[q] 7^ k[q]^ = ( X] [Up,q\ ® sgn [Up] © k[q] < ^ sgn [Up^q] © sgn [Up] © k[q] | . (137) 


. p=i 


p=i 


Recall that sgn [Up^q] © sgn [Up] = k[q] Q ^ in ( [^ where Zp ^ is a Bernoulli variable with probability 6 = 
2Pe(l — Pe)- Therefore, we have 


Pi 


I - ^ A 

Pr (k[q] / k[q]^ = ( X] ® ® ® 

\P=1 P=i 

= pr(x;i®^;,,<E^;, 


(138) 


(139) 


^p=i 
^Pl ry! 


P=1 


Noticing that J2pLi 1 ® ^p,q = Pi - Ep=i we have 


Pr (k[q] / k[q]) = Pr | ^ > P,/2 ] < 

.p=i 


Pi 


(140) 
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where the inequality follows from the Hoeffding bound. By union bounding over all n bits, we have 


Pr (k 7^ k) <ne 


{i-2gy 


Pi 


(141) 


H.2 Single-ton Search in the SO-SPRIGHT Algorithm 

In the general setting, the index is decoded after obtaining the sign sgh [A[k]]. Therefore, we have 


Pr (k / k) = Pr (k / k sgh [X[k]] = sgn [X[k]]) Pr (s'gh [X[k\] = sgn [A[k]]) 


+ Pr (k 7^ k 


sgn 


[X[k]] ^ sgn [X[k]]^ Pr (sgh [X[k\] ^ sgn [X[k\]) 


< Pr (k 7^ k s'gh [X[k]]) + Pr (s'gh [X[k]] ^ sgn [A[k]]). 


(142) 

(143) 

(144) 


If the codebook G with block length P3 = 0(n) has a minimum distance of /3P3 such that /3 > Pg! the k fails to 
be decoded when there are more than /3P3 sign flips. This can be bounded for the BSC(Pe) by the Chem-off bound 


Pr k ^ k 


s'gh [X[k]] < 


(/3/Pe-l)^ p^ 


(145) 


Since the sign is obtained from P2 sign observations through a majority test if Pe < 1/2 and a minority test if 


>1/2, the error in mistaking the sign can be bounded similarly to (1401 as 


Pr (s'gh [X[k]] 7^ sgn [X[k]]) < e' 


(l-2Pe)^ 


P 2 


(146) 


I Tail Bounds 

Here we derive some tail bounds that are useful in our analysis. 


Lemma 10 (Non-central Chi-Square Tail Bounds in [251). Let Z ~ Xd ^ non-central chi square variable with 
D degrees of freedom and non-centrality parameter 8 > 0. Then for all z > 0, the following tail bounds hold: 

Pr (^Z > {D 6) 2\/[D -\- 26)z 2z^ < exp(—z) 

Pr (^Z <{D + e)- 2v'(P + 20)z) < exp(-z) 

Lemma 11. Given g = [gf[0],--- , g[P — and a vector v = [u[0],--- , v[P — 1]]'^ with i.i.d. Gaussian variates 
v\p] ~ A((0, v^)for all p G [P], the following tail bound holds: 


^ \\9 + v\\^ >Ti]<e 

^ IIS' + < ^-2 ) < 4 


-f (V2ri/j/2-l-yi+2go)' 


(^1+Bq-T2/P^ 


for any ti and T 2 that satisfy 


n > + do), T2 < + 80), 

where 8 q is the normalized non-centrality parameter given by 


(147) 

(148) 

(149) 


80 : = 


Pz/2 


(150) 
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Proof. The quantity Hgj + t^lf can be written element-wise as 

p-i 

\\9 + vf = is\p]+v[p]y 

p=0 


(151) 


where each summand is a normal random variable with mean u\p] and variance Therefore, according to the 
definition of non-central chi-square variables, the quantity 


\g + v\ 


~ Xp 


(152) 


is a non-central random variable of P degrees of freedom with a non-centrality parameter 


p-i 

E 

p=0 


sW 


\9\ 


(153) 


For notational convenience, we use the normalized non-centrality parameter Oq in ( |150| ) such that 6 = POq. Without 
loss of generality, let the thresholds ri and T 2 take the following form with respect to zi and Z 2 '■ 


Tl = 


T2 = 


- [ 
P . 

- [ 
P . 


{P + POq) + 2 -\/(P -|- 2P6q)zi 2zi 
{P + POo) - 2^{P + 2P9 o)z2 


then the tail bounds in Lemma [T0| can be obtained easily with respect to zi and Z 2 . Using ( |153[ ), the corresponding 
zi and Z 2 can be solved as 

P 




p-i-x^i+^y 


Z2 = 


P {l + 9o- T2 /v^) 


2^2 


4 1 + 200 

as long as the thresholds t\ and T 2 satisfy ( |149| ). Thus according to Lemma[T^ we have the tail bounds in ( |147| ). □ 

Corollary 1. Suppose that the normalized non-centrality parameter 6 q in Lemma\TJ\is bounded between 

0 < 0min < 00 < 0max, (154) 


then the following worst case tail bounds hold: 

Pr ||£, + v\\^ > < e-f(V2n/+-i-Vi+2e,„ax) 

/ 2 \ p (l + Vin-T2/‘"^)^ 

( p IIS' 3“ ^11^ — ^2 j E e l + 2emin 


for any ti and T 2 that satisfy 


Tl > P^(l +0max), ^2 < Z^^(l + 0min)- (155) 

Proof The first tail bound can be easily obtained since ti > p^( 1 + ^max); the exponent is monotonically decreas¬ 
ing with respect to 9q, and therefore substituting it with 0max leads to an upper bound. 
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The second tail bound depends on the monotonicity with respect to 9q. The tail bound is monotonic with 
respect to the exponent, so in the following we examine the monotonicity of the exponent with respect to Oq. The 
exponent can be re-written as a form of the x + l/x function: 


(l + ^min - r2lv^y 

1 + 20rain 



+ 


{Oq + 2 ) 




(156) 


which has a minimum at 


Oo = 


Tl 

7/2 


(157) 


and monotonically increasing for any 9 q > 6q. Now it remains to see whether 9 q is within the interval [^mim ^max], 
which needs to be discussed separately depending on the choice of T2. 

1 . z ^^/2 < r2 < z^^(l + ^min): in this case, we have 



1 < 


(158) 


2. 0 < r2 < 1/^/2: in this case, we have 


= ~ <Q<9 


,2 - 


-2: ^min* 


(159) 


Therefore, it has been shown that as long as T 2 satisfies ( |155| ), the exponent is monotonically increasing with 
respect to 9q G [^min, ^max] and therefore the minimum exponent is achieved by substituting 9q with 0min- D 
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NSO-SPRIGHT Algorithm : Sample Complexity SO-SPRIGHT Algorithm : Sample Complexity 



n n 


(e) NSO-SPRIGHT : signal length = 2" increases by 1000 fold 
while the sample complexity increases by 5 fold. 


(f) SO-SPRIGHT : signal length = 2" increases by 1000 fold 
while the sample complexity increases by 3 fold. 


NSO-SPRIGHT Algorithm : Run-Time Performance SO-SPRIGHT Algorithm : Run-Time Performance 



n n 


(g) NSO-SPRIGHT : signal length N = 2^ increases by 1000 fold 
while the run-time increases by at most 6 fold. 


(h) SO-SPRIGHT : signal length A^ = 2" increases by 1000 fold 
while the run-time increases by at most 2 fold. 


Figure 8; The plot shows the scaling of the sample complexity and run-time of the NSO-SPRIGHT and SO-SPRIGHT 
algorithms for inputs with varying dimensions N = 2". With probability of success exceeding 0.95 and sparsity K = 
10, 20,40 at a constant SNR of 10 dB, both the sample complexity and the run-time of the NSO-SPRIGHT and SO-SPRIGHT 
algorithms scale sub-linearly in N (i.e. linear in n^). 
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